QI III: Information theory

7.1.Classical Information and Shannon Entropy

19

One way to describe information is through Shannon Entropy which gives the average number of bits required to specify a message from a set of possible messages where we know the probability of each message. In terms of probability theory, we consider a random variable \(X\) and define \(p(x)\) to be the probability that \(X = x\). Then the Shannon entropy \(H(X)\) is defined to be

\[ H(X) = -\sum_x p(x) \log p(x) \]

where conventionally we take \(\log\) to mean logarithm base \(2\). We also take the convention that \(0 \log 0 = 0\) as we don't expect very small probabilities to be much different from probability zero, and \(\lim_{p \rightarrow 0} p \log p = 0\).

This definition can be interpreted as either a measure of how unsure we are about the value of \(X\), or on average how much information we gain when we learn the value of \(X\). Note that the average is over all possible values \(x\), weighted by the probability that \(X = x\). The quantity \(- \log p(x)\) is a choice of measure of information. The interpretation is that for \(p(x) \simeq 1\), we do not gain much information if \(X\) takes that value since we anyway expected that to be the case. However, if we find \(X = x\) for some value with small \(p(x)\), that is surprising and we gain a lot of information. The choice of normalisation, or reason for taking logarithms base 2, is so that if \(x\) can take one of \(2^N\) values with equal probability, \(H(X) = N\). It is fairly obvious that we could not encode such message using fewer than \(n\) bits. The fact that in general \(H(X)\) is the lower bound on the average number of bits required to encode the messages is Shannon's Noiseless Coding Theorem. Also note that if we have a fixed set of possible message, \(H(X)\) is maximised when the probability distribution is uniform. A standard example of coding which attains the bound is:

Suppose there are four possible values with \(p(1) = 1/2\), \(p(2) = 1/4\) and \(p(3) = p(4) = 1/8\). Then \(H(X) = 7/4\). Obviously we could encode the four possible messages using two bits per message. However, we can achieve \(7/4\) bits on average with the following coding, using shorter message for more common messages:

\[ 1 \rightarrow 0 \; , \;\; 2 \rightarrow 10 \; , \;\; 3 \rightarrow 110 \; , \;\; 4 \rightarrow 111 \; . \]

It is easy to check that the average length (weighted by the above probabilities) is \(7/4\). Note that the coding is such that we don't need any extra bits to indicate the start or finish of the message. The receiver knows that \(0\) or the third consecutive \(1\) indicates the end of the message. Such messages can be sent one after the other without any ambiguity.

Now consider entropies involving two random variables \(X\) and \(Y\).

7.1.1.Joint Entropy

This is written

\[ H(X, Y) = - \sum_{x, y} p(x, y) \log p(x, y) \; . \]

It obeys a property called subadditivity:

\[ H(X, Y) \le H(X) + H(Y) \; . \]

It is easy to see that \(H(X, Y) = H(X) + H(Y)\) when \(X\) and \(Y\) are independent variables, i.e. when \(p(x, y) = p(X=x) p(Y=y)\).

7.1.2.Relative Entropy

This is defined for two random variables which take the same values but with different probability distributions, say \(p(x)\) and \(q(x)\). The relative entropy of the distribution \(p(x)\) to \(q(x)\) is

\begin{multline} H(p(x) \vert\vert q(x)) = \sum_x \left( p(x)\log p(x) - p(x) \log q(x) \right) \\ = -H(X) - \sum_x p(x) \log q(x) \; . \end{multline}

The relative entropy is non-negative and

\[ H(p(x) \vert\vert q(x)) = 0 \iff p(x) = q(x) \;\; \forall x \; . \]

A corollary of this is the subadditivity property of joint entropy, with equality if and only if the variables are independent. To see this, calculate the relative entropy of the distribution of variables \(X\) and \(Y\) with probabilities \(p(x, y)\) to the distribution with probabilities \(p(X=x)p(Y=y)\) where \(p(X=x) = \sum_y p(x, y)\) and \(p(Y=y) = \sum_x p(x, y)\).

7.1.3.Conditional Entropy and Mutual Information

The conditional entropy of \(X\) given \(Y\) is the average (over the values of \(Y\)) uncertainty left about \(X\) when we know \(Y\), or equivalently the information we gain by learning the value of \(X\) if we already knew the value of \(Y\). It is given by

\[ H(X \vert Y) = H(X, Y) - H(Y) \le H(X) \; . \]

A related concept is the mutual information of \(X\) and \(Y\) which is the amount of information we gain about one by knowing the other, or equivalently the information shared (or duplicated) by \(X\) and \(Y\) rather than being in one of them alone:

\begin{multline} H(Y:X) = H(X:Y) = H(X) + H(Y) - H(X,Y) \\[1ex] = H(X) - H(X \vert Y) = H(Y) - H(Y \vert X) \ge 0 \; . \end{multline}

7.2.Quantum Entropy

20

The von Neumann entropy of a quantum state with density operator \(\hat{\rho}\) is defined to be

\[ S(\hat{\rho}) = - \Tr \left( \hat{\rho} \log \hat{\rho} \right) \; . \]

By interpreting the state as an ensemble (of one or more) orthogonal pure states, this is just the Shannon entropy of that ensemble. I.e. we can always diagonalise the density matrix to write \(\hat{\rho} = \sum_i p_i \ket{i}\bra{i}\) where \(\ket{i}\) are orthonormal which we can use to form (at least part of) a basis. Then the density matrix is a diagonal matrix with entries \(p_i\) (and perhaps some zeroes) where these diagonal elements are the eigenvalues of \(\hat{\rho}\). Then

\[ - \Tr \left( \hat{\rho} \log \hat{\rho} \right) = -\sum_i p_i \log p_i = H(p_i) \; . \]

Note that for a pure state \(S(\hat{\rho}) = - 1 \log 1 = 0\).

As for the classical case, we can use this to define various related concepts.

7.2.1.Relative Entropy

This is a measure of the difference (or in fact a notion of the distance) between two states (in the full system) with density matrices \(\hat{\rho}_1\) and \(\hat{\rho}_2\):

\[ S(\hat{\rho}_1 \vert\vert \hat{\rho}_2) = \Tr \left( \hat{\rho}_1 \log \hat{\rho}_1 \right) - \Tr \left( \hat{\rho}_1 \log \hat{\rho}_2 \right) \ge 0 \]

with equality if and only if \(\hat{\rho}_1 = \hat{\rho}_2\).

7.2.2.Joint Entropy, Conditional Entropy and Mutual Information

If we have a bipartite system with Hilbert space \(\mathcal{H} = \mathcal{H}_A \otimes \mathcal{H}_B\), a density matrix \(\hat{\rho}\) and reduced density matrices \(\hat{\rho}_A\) and \(\hat{\rho}_B\) for each system, we write

\[ S(A) = S(\hat{\rho}_A) \; , \;\; S(B) = S(\hat{\rho}_B) \; , \;\; S(A, B) = S(\hat{\rho}) \]

where \(S(A, B)\) is called the joint entropy of systems \(A\) and \(B\).

By analogy with the classical case we can also define the conditional entropy

\[ S(A \vert B) = S(A, B) - S(B) \]

but unlike the classical case this can be negative. Indeed, if \(\hat{\rho}\) is a pure state, \(S(A, B) = 0\) but this does not imply \(S(B) = 0\) so we have in this case \(S(A \vert B) = - S(B) \le 0\). The point here is that initially we know everything about the system since it is in a pure state, but say Bob makes a measurement, this typically results in Alice having a mixed state so she can only be certain what her state is when Bob communicates the result of the measurement.

Measurement turns a pure state (entropy \(S(A,B)=0\)) to a mixed state (entropy \(S(A)=S(B)\gt{}0\)). The conditional entropy \(S(A|B)\) can thus be negative.

The mutual information is written \(I(A:B)\) or \(S(A:B)\) and is given by

\[ I(A:B) = S(A) + S(B) - S(A, B) = S(A) - S(A \vert B) = S(B) - S(B \vert A) \; . \]

In the case where \(\hat{\rho}\) is a pure state

\[ I(A:B) = S(A) + S(B) - S(A, B) = S(A) + S(B) = 2S(A) \]

so we see that entanglement can be interpreted as mutual information, i.e. the information shared by \(A\) and \(B\) and not in either subsystem alone. This agrees with what we have seen for Bell states which are maximally entangled 2-qubit states. The reduced density matrix for each qubit is \(\frac{1}{2}I\). As we have seen earlier, each qubit alone gives no information about which of the four Bell states we have, this must be stored in the entanglement, i.e. it is a shared quantity.

7.3.Bipartite Entanglement Entropy

The aim is to quantify the quantum entanglement (as opposed to the classical correlations) between two subsystems of a quantum system. We will assume that the full system is in a pure state \(\hat{\rho}\). Then we define the entanglement entropy to be

\[ S(A) = S(B) \; . \]

Entanglement entropy is the Von Neumann entropy of the reduced density matrix.

We will see below that these quantities are indeed equal since the reduced density matrices have the same non-zero eigenvalues, even if \(\mathcal{H}_A\) and \(\mathcal{H}_B\) have different dimensions (in which case the number of zero eigenvalues differs, but they don't contribute to the von Neumann entropy.)

7.3.1.Schmidt Decomposition and Schmidt Number

Diagonalise \(\hat{\rho}_A\) so that we have non-zero probabilities \(p_i\) and orthonormal states \(\ket{i} \in \mathcal{H}_A\) with

\[ \hat{\rho}_A = \sum_i p_i \ket{i}\bra{i} \; . \]

Note that the range of \(i\) is the number of non-zero eigenvalues of \(\hat{\rho}_A\) which may be less than the dimension of \(\mathcal{H}_A\). Now, since the full system is in some pure state \(\ket{\Psi} \in \mathcal{H}_A \otimes \mathcal{H}_B\), we can write

\[ \ket{\Psi} = \sum_i \sqrt{p_i} \ket{i} \otimes \ket{\phi_i} \; . \]

Checking that this gives the correct expression for \(\hat{\rho}_A\) shows that the \(\ket{\phi_i}\) are orthonormal states in \(\mathcal{H}_B\), and so the number of non-zero probabilities is also no larger than the dimension of \(\mathcal{H}_B\). We call this number of non-zero probabilities the Schmidt Number\(N_S\). Note that we can calculate

\[ \hat{\rho}_B = \sum_i p_i \ket{\phi_i}\bra{\phi_i} \]

which shows that \(S(A) = S(B)\).

The Schmidt number is a (somewhat crude, as it is integer valued) measure of the entanglement between systems \(A\) and \(B\). It satisfied the following properties:

It is invariant under unitary transformation in \(A\) or \(B\).
Any measurements made locally in \(A\) or \(B\) cannot increase the Schmidt number.
The Schmidt number is \(1\) if and only if \(\hat{\rho}_A\) and \(\hat{\rho}_B\) are pure states, or equivalently \(\hat{\rho}\) is a separable pure state.

We know that for a given number of non-zero probabilities, the Shannon entropy is maximised when the probabilities are equal. So here we have

\[ S(A) = S(B) = H(p_i) \le \log N_S \le \log D \]

where \(D = \min \left\{ \dim \mathcal{H}_A, \dim \mathcal{H}_B \right\} \).

7.4.Problems

4
1. Show that the operator \(\hat{S} = \vec{n}\cdot\vec{\sigma}\), where \(\vec{n}\) is a three-dimensional unit vector, has eigenvalues \(\pm 1\).
2. Recall that any qubit density matrix can be written in terms of a Bloch vector \(\vec{r}\). Calculate the expectation value of \(\hat{S}\) in terms of \(\vec{n}\) and \(\vec{r}\).
3. By considering the range of values possible in part (b), state what the eigenstates of \(\hat{S}\) are in terms of a relation between \(\vec{r}\) and \(\vec{n}\).
4. For each Bell state, how is the expectation value of \(\hat{S} \otimes \hat{I}\) related to that of \(\hat{I} \otimes \hat{S}\)?
5. For each Bell state, calculate the expectation value of \(\hat{S} \otimes \sigma_3\) and \(\hat{S} \otimes \sigma_1\).
Solution:
▶
1. You could write out \(\hat{S}\) as a \(2 \times 2\) matrix and calculate the eigenvalues. Alternatively, note that since \(\Tr \sigma_i = 0\), \(\Tr \hat{S} = 0\) which means that the sum of the (two) eigenvalues is \(0\). Then note that \(\hat{S}^2 = \hat{I}\) since
  \[ (\vec{n}\cdot\vec{\sigma})^2 = n_i n_j \sigma_i \sigma_j = \frac{1}{2} (n_i n_j \sigma_i \sigma_j + n_j n_i \sigma_j \sigma_i)= \frac{1}{2} n_i n_j (\sigma_i \sigma_j + \sigma_j \sigma_i) = \frac{1}{2} n_i n_j 2 \delta_{ij} I = (\vec{n} \cdot \vec{n}) I = I \; . \]
  Therefore each eigenvalue must square to \(1\), hence the two eigenvalues are \(1\) and \(-1\).
2. Since the trace is cyclic
  \[ \Tr (\sigma_i \sigma_j) = \Tr (\sigma_j \sigma_i) = \frac{1}{2} \Tr (\sigma_i \sigma_j + \sigma_j \sigma_i) = \frac{1}{2} \Tr (2\delta_{ij} I) = \delta_{ij} \Tr I = 2 \delta_{ij} \; . \]
  Therefore we have
  \[ \ev{\hat{S}} = \Tr (\hat{\rho}\hat{S}) = \frac{1}{2} \Tr \Big( (I + r_i\sigma_i)n_j \sigma_j \Big) = \frac{1}{2} r_i n_j 2 \delta_{ij} = \vec{r}\cdot\vec{n} \; . \]
3. Since \(|\vec{r}| \le 1\) and \(\vec{n}\) is a unit vector, \(-1 \le \vec{r}\cdot\vec{n} \le 1\). On the other hand, for an eigenstate of \(\hat{S}\), the expectation value is equal to the eigenvalue. Since the expectation value is an average over the possible outcomes (given by the eigenvalues), in this case any state which is not an eigenstate must have expectation value strictly between \(-1\) and \(1\). Hence the eigenstates are exactly the states with \(\vec{r} = \pm \vec{n}\).
4. One approach is to directly calculate the expectation value by considering the action of these operators on the Bell states. Another approach is to note that we can calculate the trace in a bipartite system by first taking the partial trace in one subsystem, then taking the trace in the other. So
  \[ \ev{\hat{S} \otimes \hat{I}} = \Tr (\hat{\rho}(\hat{S} \otimes \hat{I})) = \Tr_A \Tr_B (\hat{\rho}(\hat{S} \otimes \hat{I})) = \Tr_A (\hat{\rho}_A\hat{S}) \; . \]
  Similarly
  \[ \ev{\hat{I} \otimes \hat{S}} = \Tr_B (\hat{\rho}_B\hat{S}) \; . \]
  Also, recall that for any Bell state both reduced density matrices are equal to \(\frac{1}{2}\hat{I}\). This means that is all cases we have the same result
  \[ \ev{\hat{S} \otimes \hat{I}} = \ev{\hat{I} \otimes \hat{S}} = \frac{1}{2}\Tr(\hat{S}) \; . \]
  The above is true for any operator \(\hat{S}\), but in this specific case we also know that \(\Tr(\hat{S}) =0\).
5. Consider
  \begin{eqnarray*} \sqrt{2} \hat{S} \otimes \sigma_3 \ket{\beta_{xy}} & = & \hat{S} \ket{0} \otimes \sigma_3 \ket{y} + (-1)^x \hat{S} \ket{1} \otimes \sigma_3 \ket{\overline{y}} \\ & = & (-1)^y \left( \hat{S} \ket{0} \otimes \ket{y} - (-1)^x \hat{S} \ket{1} \otimes \ket{\overline{y}} \right) \end{eqnarray*}
  So we see that
  \[ 2 \ev{\hat{S} \otimes \sigma_3} = \bra{\beta_{xy}} \hat{S} \otimes \sigma_3 \ket{\beta_{xy}} = (-1)^y \left( \bra{0} \hat{S} \ket{0} - \bra{1} \hat{S} \ket{1} \right) \]
  The above didn't use any properties of \(\hat{S}\). It is easy to now evaluate explicitly, noting that only \(\sigma_3\) contributes since the other \(\sigma\)-matrices have vanishing diagonal components. Alternatively, note that we can also use \(\sigma_3 \ket{0} = \ket{0}\) and \(\sigma_3 \ket{1} = -\ket{1}\) to write
  \[ 2 \ev{\hat{S} \otimes \sigma_3} = (-1)^y \left( \bra{0} \hat{S} \sigma_3 \ket{0} + \bra{1} \hat{S} \sigma_3 \ket{1} \right) = (-1)^y \Tr ( \hat{S} \sigma_3 ) \; . \]
  Then, since \(\Tr(\sigma_i \sigma_j) = 2 \delta_{ij}\) we see that
  \[ \ev{\hat{S} \otimes \sigma_3} = (-1)^y n_3 \; . \]
  
  Similarly, note that since \(\sigma_1 \ket{0} = \ket{1}\) and \(\sigma_1 \ket{1} = \ket{0}\) we can write \(\sigma_1 \ket{y} = \ket{\overline{y}}\) so
  \begin{eqnarray*} \sqrt{2} \hat{S} \otimes \sigma_1 \ket{\beta_{xy}} & = & \hat{S} \ket{0} \otimes \sigma_1 \ket{y} + (-1)^x \hat{S} \ket{1} \otimes \sigma_1 \ket{\overline{y}} \\ & = & \hat{S} \ket{0} \otimes \ket{\overline{y}} + (-1)^x \hat{S} \ket{1} \otimes \ket{y} \\ & = & \hat{S} \sigma_1 \ket{1} \otimes \ket{\overline{y}} + (-1)^x \hat{S} \sigma_1 \ket{0} \otimes \ket{y} \\ & = & (-1)^{x} (\hat{S} \sigma_1 \otimes \hat{I}) \ket{\beta_{xy}} \; . \end{eqnarray*}
  As above we can then easily calculate
  \[ \ev{\hat{S} \otimes \sigma_1} = (-1)^{x} \frac{1}{2} \Tr(\hat{S} \sigma_1) = (-1)^{x} n_1 \; . \]
  
  For completeness, note that we can calculate the similar expression with \(\sigma_2\). Using \(\sigma_2 \ket{0} = i\ket{1}\) and \(\sigma_2 \ket{1} = -i \ket{0}\) we can write \(\sigma_2 \ket{y} = i(-1)^y \ket{\overline{y}}\) so
  \begin{eqnarray*} \sqrt{2} \hat{S} \otimes \sigma_2 \ket{\beta_{xy}} & = & \hat{S} \ket{0} \otimes \sigma_2 \ket{y} + (-1)^x \hat{S} \ket{1} \otimes \sigma_2 \ket{\overline{y}} \\ & = & i(-1)^y \left( \hat{S} \ket{0} \otimes \ket{\overline{y}} - (-1)^x \hat{S} \ket{1} \otimes \ket{y} \right) \\ & = & -(-1)^y \left( \hat{S} \sigma_2 \ket{1} \otimes \ket{\overline{y}} + (-1)^x \hat{S} \sigma_2 \ket{0} \otimes \ket{y} \right) \\ & = & -(-1)^{x+y} (\hat{S} \sigma_2 \otimes \hat{I}) \ket{\beta_{xy}} \; . \end{eqnarray*}
  As above we can then easily calculate
  \[ \ev{\hat{S} \otimes \sigma_2} = -(-1)^{x+y} \frac{1}{2} \Tr(\hat{S} \sigma_2) = -(-1)^{x+y} n_2 \; . \]
4
Suppose a device performs a unitary transformation on the tensor product of any input state with a fixed state \(\ket{\Omega}\). For two different input states \(\ket{\psi}\) and \(\ket{\phi}\) the device seems to be a quantum copier, i.e.
it maps
\[ \ket{\psi} \otimes \ket{\Omega} \rightarrow \ket{\psi} \otimes \ket{\psi} \;\; \mathrm{and} \;\; \ket{\phi} \otimes \ket{\Omega} \rightarrow \ket{\phi} \otimes \ket{\phi} . \]
Show that this is only possible if the two states \(\ket{\psi}\) and \(\ket{\phi}\) are either the same or orthogonal.

Solution:
▶
Unitary transformations preserve inner products. Therefore the norm of \(\ket{\psi} \otimes \ket{\Omega}\) must equal the norm of \(\ket{\psi} \otimes \ket{\psi}\), which means that \(\ket{\psi}\) and \(\ket{\Omega}\) have the same norm. Similarly we see that \(\ket{\phi}\) also. has the same norm.

Now compare the inner product of the two states before and after the transformation, i.e.
\[ \ip{\psi}{\phi} \ip{\Omega}{\Omega} = (\ip{\psi}{\phi})^2 \]
so either \(\ip{\psi}{\phi} = 0\), in which case \(\ket{\psi}\) and \(\ket{\phi}\) are orthogonal, or \(\ip{\psi}{\phi} = \ip{\Omega}{\Omega}\). Since the states all have the same norm, the final relation is only possible if \(\ket{\psi} = \ket{\phi}\).
Show that the Shannon entropy of a random variable which can take \(N\) different values is maximised if and only if the probability distribution is uniform. Calculate the maximum Shannon entropy (for a given fixed \(N\)).

Solution:
▶
Recall that the Shannon entropy is
\[ H = - \sum_{i=1}^N p_i \log p_i \]
where the \(p_i\) are the probabilities of each possible outcome. Note that these are not \(N\) independent variables since we have the constraint \(\sum_{i=1}^N p_i = 1\). There are two obvious ways to look for the maximum value. We could substitute \(p_N = 1 - \sum_{i=1}^{N-1} p_i\) and then set \(\frac{\partial H}{\partial p_i} = 0\) for \(i \in \{ 1, 2, \ldots , N-1 \}\) or we could instead introduce a Lagrange multiplier \(\lambda\) and set \(\frac{\partial \tilde{H}}{\partial \lambda} = 0\) and \(\frac{\partial \tilde{H}}{\partial p_i} = 0\) for \(i \in \{ 1, 2, \ldots , N \}\) where
\[ \tilde{H} = H + \lambda \left( \sum_{i=1}^N p_i - 1 \right) \; . \]
Using the second method we find
\begin{eqnarray*} 0 = \frac{\partial \tilde{H}}{\partial \lambda} & = & \sum_{i=1}^N p_i - 1 \\ 0 = \frac{\partial \tilde{H}}{\partial p_i} & = & - \log p_i - \frac{1}{\ln 2} + \lambda \end{eqnarray*}
The second equation shows that all \(p_i\) have the same value, and so from the first equation (imposing the constraint on the total probability) we see that \(p_i = 1/N\). In this case we find
\[ H = - \sum_{i=1}^N \frac{1}{N} \log \left( \frac{1}{N} \right) = \log N \; . \]
You could check that this is really a maximum and not some other turning point. A simple argument is that since we only found one turning point, the only other possibility for a maximum is at the boundary of the region on parameter space, i.e. where some of the \(p_i = 0\). (This includes the case where one \(p_i = 1\) since then all other \(p_i\) must vanish.) However, these give lower values for \(H\), \(\log (N - M)\) for \(M \in \{ 1, 2, \ldots , N-1 \}\).
Suppose we have messages encoded as strings of \(N\) bits, where each bit can be considered a random variable having value \(1\) with probability \(p\).
1. Calculate the Shannon entropy of a single bit.
2. What is the Shannon entropy of a string of \(N\) bits? (Recall that if \(X\) and \(Y\) are independent random variables, \(H(X,Y) = H(X) + H(Y)\).)
3. Suppose \(N=1000\) and \(p=3/4\). What is the minimum average length of message which could contain the same information? (You do not have to give a specific method to encode the message.)
Solution:
▶
1. We have two possible outcomes, \(1\) with probability \(p\) and \(0\) with probability \(1-p\). Therefore the Shannon entropy is
  \[ H = -p \log p - (1-p) \log(1-p) \; . \]
2. Let's consider a string of \(N\) bits, \(X_N\), as a string of \(N-1\) bits, \(X_{N-1}\), plus a single bit, \(X_1\). Since the bits are independent
  \[ H(X_N) \equiv H(X_{N-1}, X_1) = H(X_{N-1}) + H(X_1) ; . \]
  So, by induction we find
  \[ H(X_N) = N H(X_1) = -Np \log p - N(1-p) \log(1-p) \]
  using the result in part (a) for \(H(X_1)\).
3. For \(p = 3/4\) we find
  \[ H(X_{1000}) = -750 \log (3/4) - 250 \log (1/4) = 750 \times 2 - 750 \log 3 + 250 \times 2 = 2000 - 750 \log 3 \simeq 811.3 \]
  so such a message could on average be encoded using only about \(811.3\) bits.
Suppose two bits are described by random variables \(X\) and \(Y\).
1. Calculate the joint Shannon entropy, the conditional entropy of \(X\) given \(Y\), the conditional entropy of \(Y\) given \(X\), and the mutual information in each of the following cases, where \((X,Y)\) can take (with equal probability) only the values:
  1. \((0,0), (0,1), (1,0), (1,1)\)
  2. \((0,1), (1,0), (1,1)\)
  3. \((0,1), (1,0)\)
  4. \((0,1), (1,1)\)
2. Calculate the relative entropy for the above probability distributions of
  - Case (iii) to case (ii)
  - Case (iii) to case (i)
  - Case (ii) to case (i)
  - Case (iv) to case (ii)
  - Case (iv) to case (i)
Solution:
▶
1. Let's use notation \(p_x = P(X=x)\), \(p_y = P(Y=y)\) and \(p_{x,y} = P(X=x \; \& \; Y=y)\), where for the different cases we have the following probabilities
  \[ \begin{array}{rcccccccc} \mathrm{Case} & P(X=0) & P(X=1) & P(Y=1) & P(Y=1) & p_{0,0} & p_{0,1} & p_{1,0} & p_{1,1} \\[0.3cm] (i) & \frac{1}{2} & \frac{1}{2} & \frac{1}{2} & \frac{1}{2} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \\[0.2cm] (ii) & \frac{1}{3} & \frac{2}{3} & \frac{1}{3} & \frac{2}{3} & 0 & \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\[0.2cm] (iii) & \frac{1}{2} & \frac{1}{2} & \frac{1}{2} & \frac{1}{2} & 0 & \frac{1}{2} & \frac{1}{2} & 0 \\[0.2cm] (iv) & \frac{1}{2} & \frac{1}{2} & 0 & 1 & 0 & \frac{1}{2} & 0 & \frac{1}{2} \\ \end{array} \]
  1. First let's calculate the Shannon entropy for X and for Y, then the joint entropy. Remember that for these calculations we take logarithms in base \(2\).
    \begin{eqnarray*} H(X) & \equiv & -\sum_x p_x \log(p_x) = -2 \times \frac{1}{2} \log \left( \frac{1}{2} \right) = 1 \\ H(Y) & \equiv & -\sum_y p_y \log(p_y) = -2 \times \frac{1}{2} \log \left( \frac{1}{2} \right) = 1 \\ H(X, Y) & \equiv & -\sum_{x,y} p_{x,y} \log(p_{x,y}) = -4 \times \frac{1}{4} \log \left( \frac{1}{4} \right) = 2 \end{eqnarray*}
    We can now calculate the conditional entropy of \(X\) given \(Y\), \(H(X \vert Y)\) and similarly of \(Y\) given \(X\), \(H(Y \vert X)\)
    \begin{eqnarray*} H(X \vert Y) & \equiv & H(X, Y) - H(Y) = 2 - 1 = 1 \\ H(Y \vert X) & \equiv & H(X, Y) - H(X) = 2 - 1 = 1 \end{eqnarray*}
    noting that \(H(X, Y) = H(Y, X)\). Finally, the mutual information is
    \[ H(X : Y) \equiv H(X) + H(Y) - H(X, Y) = 1 + 1 - 2 = 0 \; . \]
  2. \begin{eqnarray*} H(X) & = & - \frac{1}{3} \log \left( \frac{1}{3} \right) - \frac{2}{3} \log \left( \frac{2}{3} \right) = \log(3) - \frac{2}{3} \\ H(Y) & = & - \frac{1}{3} \log \left( \frac{1}{3} \right) - \frac{2}{3} \log \left( \frac{2}{3} \right) = \log(3) - \frac{2}{3} \\ H(X, Y) & = & -3 \times \frac{1}{3} \log \left( \frac{1}{3} \right) = \log(3) \\ H(X \vert Y) & = & \log(3) - \left( \log(3) - \frac{2}{3} \right) = \frac{2}{3} \\ H(Y \vert X) & = & \log(3) - \left( \log(3) - \frac{2}{3} \right) = \frac{2}{3} \\ H(X : Y) & = & \log(3) - \frac{4}{3} \end{eqnarray*}
  3. \begin{eqnarray*} H(X) & = & - 2 \times \frac{1}{2} \log \left( \frac{1}{2} \right) = 1 \\ H(Y) & = & - 2 \times \frac{1}{2} \log \left( \frac{1}{2} \right) = 1 \\ H(X, Y) & = & - 2 \times \frac{1}{2} \log \left( \frac{1}{2} \right) = 1 \\ H(X \vert Y) & = & 1 - 1 = 0 \\ H(Y \vert X) & = & 1 - 1 = 0 \\ H(X : Y) & = & 1 + 1 - 1 = 1 \end{eqnarray*}
  4. \begin{eqnarray*} H(X) & = & - 2 \times \frac{1}{2} \log \left( \frac{1}{2} \right) = 1 \\ H(Y) & = & - 1 \log \left( 1 \right) = 0 \\ H(X, Y) & = & - 2 \times \frac{1}{2} \log \left( \frac{1}{2} \right) = 1 \\ H(X \vert Y) & = & 1 - 0 = 1 \\ H(Y \vert X) & = & 1 - 1 = 0 \\ H(X : Y) & = & 1 + 0 - 1 = 0 \end{eqnarray*}
2. For the relative entropy let's label the probability distribution of case A, \(p_{x, y}\) and the probability distribution of case B, \(q_{x, y}\). Then the relative entropy of case A to case B is
  \[ H(A \vert\vert B) \equiv \sum_{x, y} p_{x, y} \left( \log(p_{x, y}) - \log(q_{x, y}) \right) \; . \]
  Note that it is not symmetric. We then find
  \begin{eqnarray*} H(iii \vert\vert ii) = 2 \times \frac{1}{2} \left( \log \left( \frac{1}{2} \right) - \log \left( \frac{1}{3} \right) \right) = \log(3) - 1 \\ H(iii \vert\vert i) = 2 \times \frac{1}{2} \left( \log \left( \frac{1}{2} \right) - \log \left( \frac{1}{4} \right) \right) = -1 + 2 = 1 \\ H(ii \vert\vert i) = 3 \times \frac{1}{3} \left( \log \left( \frac{1}{3} \right) - \log \left( \frac{1}{4} \right) \right) = 2 - \log(3) \\ H(iv \vert\vert ii) = 2 \times \frac{1}{2} \left( \log \left( \frac{1}{2} \right) - \log \left( \frac{1}{3} \right) \right) = \log(3) - 1 \\ H(iv \vert\vert i) = 2 \times \frac{1}{2} \left( \log \left( \frac{1}{2} \right) - \log \left( \frac{1}{4} \right) \right) = 1 \end{eqnarray*}
4
Consider a qubit with density matrix
\[ \rho = \frac{1}{2} \left( \begin{array}{cc} 1+z & 0 \\ 0 & 1-z \end{array} \right) \]
1. Calculate the von Neumann entropy \(S(\rho)\) as a function of \(z\).
2. Show that \(S(\rho)\) is a monotonic function for \(z \in [0,1]\) and find the minimum and maximum values of the entropy.
3. Calculate the entanglement entropy, as a function of \(\theta\), of the state
  \[ \ket{\Psi} = \cos\theta \ket{0} \otimes \ket{0} + \sin\theta \ket{1} \otimes \ket{1} . \]
Solution:
▶
1. Since \(\rho\) is diagonal we can easily calculate
  \begin{eqnarray*} S(\rho) & = & -\Tr \left( \rho \log \rho \right) = -\frac{1+z}{2} \log \left( \frac{1+z}{2} \right) -\frac{1-z}{2} \log \left( \frac{1-z}{2} \right) \\ & = & 1 - \frac{1}{2} (1+z) \log (1+z) - \frac{1}{2} (1-z) \log (1-z) \end{eqnarray*}
2. We can then easily find
  \[ \frac{dS}{dz} = - \frac{1}{2} \log (1+z) - \frac{1}{2 \ln 2} + \frac{1}{2} \log (1-z) + \frac{1}{2 \ln 2} = \frac{1}{2} \log \left( \frac{1-z}{1+z} \right) \; . \]
  For the given range of \(z\), this quantity vanishes at the endpoint \(z=0\) and is negative for all other values of \(z\), so the function is monotonic with maximum \(S = 1\) at \(z=0\) and minimum
  \[ S = 1 - \frac{1}{2} 2 \log 2 - \frac{1}{2} 0 \log 0 = 1 - 1 - 0 = 0 \]
  with our usual understanding that \(0 \log 0 = 0\) by considering it as the limit \(\lim_{p \rightarrow 0} p \log p\).
3. The entanglement entropy is the von Neumann entropy of the reduced density matrix (in either subsystem). Here we have
  \[ \hat{\rho}_A = \Tr_B \left( \cos^2\theta \ket{0}\bra{0} \otimes \ket{0}\bra{0} + \sin^2\theta \ket{1}\bra{1} \otimes \ket{1}\bra{1} \right) = \cos^2\theta \ket{0}\bra{0} + \sin^2\theta \ket{1}\bra{1} \]
  Therefore the entanglement entropy is
  \[ S(A) = S( \hat{\rho}_A ) = -\cos^2\theta \log (\cos^2\theta) - \sin^2\theta \log (\sin^2\theta) \; . \]
Calculate the von Neumann entropy of
1. \(\frac{1}{\sqrt{2}}\left( \ket{0} + \ket{1} \right)\)
2. The ensemble of \(\ket{0}\) and \(\ket{1}\) with equal probabilities.
and calculate the relative entropy of state (i) to state (ii).

Solution:
▶
The state (i) is a pure state so its von Neumann entropy is \(0\). If you want to see this explicitly, work in the orthonormal basis \(\{ \ket{\pm} = (\ket{0} \pm \ket{1})/\sqrt{2} \}\) and represent \(\ket{+}\) by \(\cvec{1 \\ 0}\) and \(\ket{-}\) by \(\cvec{0 \\ 1}\). Then the state (i) is just \(\ket{+}\) so its density matrix is
\[ \rho = \cvec{1 \\ 0} ( 1 \;\; 0) = \left( \begin{array}{cc} 1 & 0 \\ 0 & 0 \end{array} \right) \; . \]
Therefore we have
\[ S(\rho) = -1 \log 1 - 0 \log 0 = 0 \; . \]

For the mixed state (ii) we can work in the usual basis, representing \(\ket{0}\) by \(\cvec{1 \\ 0}\) and \(\ket{1}\) by \(\cvec{0 \\ 1}\). Then we easily find
\[ \rho = \frac{1}{2} \cvec{1 \\ 0} ( 1 \;\; 0) + \frac{1}{2} \cvec{0 \\ 1} ( 0 \;\; 1) = \left( \begin{array}{cc} 1/2 & 0 \\ 0 & 1/2 \end{array} \right) \]
giving the von Neumann entropy
\[ S(\rho) = -\frac{1}{2} \log \left( \frac{1}{2} \right) -\frac{1}{2} \log \left( \frac{1}{2} \right) = \log 2 = 1 \; . \]

Now, to calculate the relative entropy we use the definition
\[ S(\rho_1 \vert\vert \rho_2) = \Tr(\rho_1 \log \rho_1) - \Tr(\rho_1 \log \rho_2) \; . \]
Here we take \(\rho_1\) to be the density matrix for state (i), and \(\rho_2\) for state (ii). The first term is just minus the von Neumann entropy of state (i) so we know from above that this is \(0\). To calculate the second term we must write the two density matrices in the same representation. There are two natural ways to do this. We could note that since \(\rho_2\) is proportional to the identity matrix in one orthonormal basis, it is the same in any orthonormal basis. However, to illustrate the method more generally, let's work in the basis representing \(\ket{0}\) by \(\cvec{1 \\ 0}\) and \(\ket{1}\) by \(\cvec{0 \\ 1}\). note that when choosing the basis, it is important that \(\rho_2\) is diagonal so that we can calculate \(\log \rho_2\) be just taking the logarithm of each diagonal element. In this representation state (i) is \(\frac{1}{\sqrt{2}}\cvec{1 \\ 1}\) so
\[ \rho_1 = \frac{1}{2} \cvec{1 \\ 1} ( 1 \;\; 1) = \frac{1}{2} \left( \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right) \; . \]
Then we can calculate
\begin{eqnarray*} \Tr(\rho_1 \log \rho_2) & = & \Tr \left[ \frac{1}{2} \left( \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right) \left( \begin{array}{cc} \log(1/2) & 0 \\ 0 & \log(1/2) \end{array} \right) \right] \\ & = & -\frac{1}{2} \Tr \left( \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right) = -1 \end{eqnarray*}
which gives
\[ S(\rho_1 \vert\vert \rho_2) = 0 - (-1) = 1 \; . \]
Consider the state
\[ \ket{\Psi} = \cos\theta \ket{0} \otimes \ket{0} + \sin\theta \ket{1} \otimes \ket{1} \in \mathcal{H}_A \otimes \mathcal{H}_B . \]
Calculate the relative entropy of this state to the state with density matrix
\[ \hat{\rho} = \hat{\rho}_A \otimes \hat{\rho}_B \]
where \(\hat{\rho}_A\) and \(\hat{\rho}_B\) are the reduced density matrices in each system.

Solution:
▶
Here we want to calculate
\[ S(\hat{\rho} \vert\vert \hat{\rho}_A \otimes \hat{\rho}_B) = \Tr (\hat{\rho} \log \hat{\rho}) - \Tr (\hat{\rho} \log (\hat{\rho}_A \otimes \hat{\rho}_B) ) \; . \]
Since \(\hat{\rho}\) is the density matrix for the pure state \(\ket{\Psi}\) we know that the first term, which is \(-S(\hat{\rho})\), is \(0\).

For the second term we need to calculate the reduced density matrices. This is easy and by the symmetry both have the same form
\[ \cos^2\theta \ket{0} \bra{0} + \sin^2\theta \ket{1}\bra{1} \]
so we find
\begin{eqnarray*} \hat{\rho}_A \otimes \hat{\rho}_B & = & \cos^4\theta (\ket{0} \otimes \ket{0}) (\bra{0} \otimes \bra{0}) + \cos^2\theta \sin^2\theta (\ket{0} \otimes \ket{1}) (\bra{0} \otimes \bra{1}) + \\ & & + \cos^2\theta \sin^2\theta (\ket{1} \otimes \ket{0}) (\bra{1} \otimes \bra{0}) + \sin^4\theta (\ket{1} \otimes \ket{1}) (\bra{1} \otimes \bra{1}) \; . \end{eqnarray*}
This means that in the standard basis for the 2-qubit system
\[ \rho_A \otimes \rho_B = \mathrm{diag} (\cos^4\theta, \cos^2\theta \sin^2\theta, \cos^2\theta \sin^2\theta, \sin^4\theta ) \; . \]
On the other hand
\begin{eqnarray*} \hat{\rho} & = & \ket{\Psi}\bra{\Psi} = \cos^2\theta (\ket{0} \otimes \ket{0}) (\bra{0} \otimes \bra{0}) + \cos\theta \sin\theta (\ket{0} \otimes \ket{0}) (\bra{1} \otimes \bra{1}) + \\ & & + \cos\theta \sin\theta (\ket{1} \otimes \ket{1}) (\bra{0} \otimes \bra{0}) + \sin^2\theta (\ket{1} \otimes \ket{1}) (\bra{1} \otimes \bra{1}) \; . \end{eqnarray*}
So, in \(4 \times 4\) matrix form we have
\begin{eqnarray*} S(\hat{\rho} \vert\vert \hat{\rho}_A \otimes \hat{\rho}_B) & = & - \Tr (\hat{\rho} \log (\hat{\rho}_A \otimes \hat{\rho}_B) ) \\ & = & - \Tr \left[ \left( \begin{array}{cccc} \cos^2\theta & 0 & 0 & \cos\theta \sin\theta \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \cos\theta \sin\theta & 0 & 0 & \sin^2\theta \end{array} \right) \times \right. \\ & & \left. \times \left( \begin{array}{cccc} \log(\cos^4\theta) & 0 & 0 & 0 \\ 0 & \log(\cos^2\theta \sin^2\theta) & 0 & 0 \\ 0 & 0 & \log(\cos^2\theta \sin^2\theta) & 0 \\ 0 & 0 & 0 & \log(\sin^4\theta) \end{array} \right) \right] \\ & = & -2 \left( \cos^2\theta \log(\cos^2\theta) + \sin^2\theta \log(\sin^2\theta) \right) \; . \end{eqnarray*}

Ask a question about the highlighted paragraph(s):

Long-tap anywhere to ask a question (or see the reply). Right-click anywhere to ask a question (or see the reply).

1.	Introduction
2.	Quantum mechanics essentials
3.	Measurement and uncertainty
4.	Qubits and the Bloch sphere
5.	Bipartite systems
6.	Entanglement applications
7.	Information theory
7.1.	Classical Information and Shannon Entropy
7.2.	Quantum Entropy
7.3.	Bipartite Entanglement Entropy
7.4.	Problems
8.	Changelog
9.	Bibliography