markov

Let $\mathcal{I}= \{i_0, i_1, i_2, \ldots \}$ be a countable (finite or infinite) state space. Attempting to generalise the theory of discrete time Markov chains to continuous time, we immediately notice that if $(X_t)_{t\ge0}$ is a continuous time version of a Markov chain, there is no smallest time step and therefore one cannot speak of the corresponding one-step transition matrix $P$ anymore. Instead one has to embed the sequence $P^0=I$, $P^1$, $P^2$, … into its continuous-time analogue, the semigroup.

Definitions and Intuition

A continuous time Markov chain $X_t$ for $t \geq 0$ is a stochastic process taking values in a countable state space $\mathcal{I}$. This means the $X_t$ are (an uncountable) collection of random variables defined on a common probability space.

Drawing intuition from discrete time Markov chains, one can think of the process $X_t$ as follows. Starting from an initial state $X_0 = i$, the process waits for some random amount of time before jumping to a new state $j$, then waits again for a random amount of time until it jumps to another state $k$ and so on. So the trajectory of the process $t \mapsto X_t$ is a right continuous step function with jumps at the times when the process changes states.

Markov processes are characterized by the Markov property, which states that conditional of the present, the past and future of the process are independent. In probabilistic notation this is stated as follows. Given times $0 < s < t$ and any set of prior times $0 \leq s_0 < s_1 < \cdots < s_n = s$, one has that \[\label{eqn:Markovprop} \mathsf{P}\left (X_t = j \mid X_{s_1} = x_1, X_{s_2} = x_2, \ldots, X_{s_n} = x_n \right )= \mathsf{P}(X_t = j \mid X_s = x_n).\] This should hold for all possible choice of states $x_1, x_2, \ldots, x_n \in \mathcal{I}$. The condition states that when conditioning on a past sequence of times, it is enough to condition only on the last time.

A Markov process is time homogeneous if for every $0 \leq s < t$, \[\mathsf{P}\left (X_t = j \mid X_s = i \right) = \mathsf{P}\left (X_{t-s} = j \mid X_0 = i \right).\] We will only study time homogenous Markov processes.

Suppose $X_t$ is a time homogenous Markov process that starts at state $X_0 = i$. Let $\tau$ be the time it waits until going to a new state $j \neq i$. What is the distribution of $\tau$? Notice that \[\mathsf{P}(\tau \geq t+s \mid \tau \geq s) = \mathsf{P}(X_u = i \;\text{for}\; u \leq t+s \mid X_u = i \; \text{for}\; u \leq s).\] By the Markov property this is the same as \[\mathsf{P}(X_u = i \;\text{for}\; u \leq t+s \mid X_s = i),\] and by time homogeneity this equals \[\mathsf{P}(X_u = i \;\text{for}\; u \leq t \mid X_0 = i) = \mathsf{P}(\tau \geq t).\] Therefore $\tau$ has the memoryless property $\mathsf{P}(\tau \geq t+s \mid \tau \geq s) = \mathsf{P}(\tau \geq t)$. This means $\tau$ has an exponential distribution with some rate $\lambda_i$.

The above derivation provides good intuition for how a Markov process evolves. Starting from some initial state $x_1$, the process stays there for some time $\tau_1 \sim \mathrm{Exp}(\lambda_{x_1})$. Then it moves to a new state $x_2 \neq x_1$ according to some probability $p(x_1, x_2)$. It stays there for time $\tau_2 \sim \mathrm{Exp}(\lambda_{x_2})$ and then moves to a new state $x_3$ with probability $p(x_2,x_3)$, and so on.

The Markov process is thus determined from the rates $\lambda_i > 0$ for $i \in \mathcal{I}$ as well at the probabilities $p(i,j)$ for $i, j \in \mathcal{I}$. The probabilities $p(i,j)$ have to satisfy $p(i,i) = 0$ and $\sum_j p(i,j) = 1$. We think of $\lambda_i$ as the rate at which the process leaves state $i$. Then it transitions to state $j$ with probability $p(i,j)$. So the rate at which the process transitions from state $i$ to state $j$ is $q_{i,j} = \lambda_i p(i,j)$. We will see a more formal definition of the rates $q_{i,j}$ below.

Transition Probability and Chapman-Kolmogorov Equations

A discrete time Markov chain $X_0, X_1, \ldots$ with state space $\mathcal{I}$ is determined by its transition matrix $P = \bigl(p_{i,j}\bigr)_{i,j \in \mathcal{I}}$. There is an analogue of this in the continuous time case. For every $t \geq 0$, define the matrix \[P_t = \bigl( p_{i,j}(t)\bigr)_{i,j \in \mathcal{I}}\] as \[\label{eqn:Pmatrix} p_{i,j}(t) = \mathsf{P}(X_t = j \mid X_0 = i),\] which is the transition probability to go from state $i$ at time 0 to state $j$ at time $t$. By time homogeneity, $p_{i,j}(t) = \mathsf{P}(X_{t+s} = j \mid X_s =i)$ for every $s \geq 0$. By convention, $P_0 = I$ is the identity matrix.

The matrices $P_t$ are called the transition matrices of the process $X_t$. They are stochastic matrices in that \[(P_t)_{i,j} = p_{i,j}(t) \geq 0 \quad \text{and}\quad \sum_{j \in \mathcal{I}} (P_t)_{i,j} = \sum_j p_{i,j}(t) = 1.\]

In the discrete time case, the $n$-step transition probability is given by the $n$-th matrix power of $P$. There is an analogous formula for the continuous time transition probabilities, which is called the Chapman-Kolmogorov equation: \[\label{eqn:CK} P_{t+s} = P_t P_s \quad \text{for every}\; s,t \geq 0.\] The right hand side is matrix multiplication, meaning that \[(P_tP_s)_{i,j} = \sum_{k \in \mathcal{I}} p_{i,k}(t) p_{k,j}(s).\]

To prove the formula [eqn:CK] we use conditional probability. For any $i,j \in \mathcal{I}$, \[(P_{t+s})_{i,j} = p_{i,j}(t+s) = \mathsf{P}(X_{t+s}=j \mid X_0 = i).\] Condition on the value of the process at time $t$: by the law of total probability \[\mathsf{P}(X_{t+s}=j \mid X_0 = i) = \sum_{k \in \mathcal{I}} \mathsf{P}(X_{t+s}=k, X_t = k \mid X_0 = i).\] Now, \[\mathsf{P}(X_{t+s}=k, X_t = k \mid X_0 = i) = \mathsf{P}(X_{t+s}=k \mid X_t=k, X_0 = i) \mathsf{P}(X_t = k \mid X_0 = i).\] Note $\mathsf{P}(X_t = k \mid X_0 = i) = p_{i,k}(t)$. By the Markov property [eqn:Markovprop], \[\mathsf{P}(X_{t+s}=k \mid X_t=k, X_0 = i) = \mathsf{P}(X_{t+s}=j \mid X_t = k) = \mathsf{P}(X_s=j \mid X_0 = k) = p_{k,j}(s).\] Plugging this into the equation above we get \[(P_{t+s})_{i,j} = \mathsf{P}(X_{t+s}=j \mid X_0 = i) = \sum_{k \in \mathcal{I}} p_{k,j}(s)p_{i,k}(t) = \sum_{k \in \mathcal{I}} p_{i,k}(t)p_{k,j}(s) = (P_t P_s)_{i,j}.\]

$Q$-matrices

Let $(X_t)_{t\ge0}$ be a continuous time Markov process on $\mathcal{I}$. In the previous section we found that the transition matrices $P_t$ satisfy the property $P_{t+s} = P_tP_s$. This is called the semigroup property. It is analogous of the relation $P^{m+n} = P^m P^n$ for discrete time Markov chains. The natural question is now if we can find a way to express $P_t$ is matrix powers in continuous time? To gain some intuition note that the exponential function $e(t) = e^{\lambda t}$ satisfies $e(t+s) = e(t) e(s)$. Thus, we may want to express $P_t$ in the form $P_t = e^{t Q}$ in matrix form with $Q = \frac{d}{dt} P_t \mid_{t=0}$. This motivates the following.

For $i, j \in \mathcal{I}$, $i\neq j$, consider the transition probability \[p_{i,j}(h):=\mathsf{P}(X_{t+h}=j\mid X_t=i)\,.\] where $h$ is close to 0. If $p_{i,j}(t)$ were differentiable (recall that $p_{i,j}(0)=0$), one could define \[q_{i,j}:=\lim_{h\searrow0}\frac1h\,\mathsf{P}(X_{t+h}=j \mid X_t=i)=\lim_{h\searrow0}\frac{p_{i,j}(h)-p_{i,j}(0)}{h-0}=p'_{i,j}(0)\,.\] As before, we then can write \[\label{eq:ctMC-local-Markov-property-def} \mathsf{P}(X_{t+h}=j\mid X_t=i)=q_{i,j}h+o(h)\,,\] where $o(h)$ decays faster than $h$ in the limit of small $h$ (i.e., $o(h)/h\to0$ as $h\to0$).

Notice that since the transition probabilities sum to 1, we have \[\mathsf{P}(X_{t+h}=i\mid X_t=i)=1-\sum_j q_{i,j}\,h+o(h)\,,\] where the sum runs over all $j \in \mathcal{I}$ such that $j \neq i$. Using the notation \[q_{i,i}:=-\sum_{j:j \neq i} q_{i,j}\,,\] we can rewrite the last relation as \[\mathsf{P}(X_{t+h}=i\mid X_t=i)=1+q_{i,i}\,h+o(h)\,.\] Notice that, as above, the function $p_{i,i}(h):=\mathsf{P}(X_{t+h}=i \mid X_t=i)$ satisfies $q_{i,i}=p'_{i,i}(0)$.

The matrix $Q=\bigl(q_{i,j} \bigr)_{i,j \in \mathcal{I}}$ contain all information about the time evolution of $X_t$. It is often called the generator of the Markov process $X_t$.

A matrix $Q=\bigl(q_{i,j}\bigr)_{i,j\in \mathcal{I}}$ is called a $Q$-matrix, if:

Note that $q_{i,j}$ with $i \neq j$ is the rate of transition $i \to j$, while $-q_{i,i}$ is the rate of leaving state $i$.

Example 1. The Poisson process with rate $\lambda>0$ is a Markov chain with $Q$-matrix $Q=\bigl(q_{i,j}\bigr)_{i,j\ge0}$ whose non-vanishing entries are given by $q_{j,j+1}=\lambda$, $q_{j,j}=-\lambda$ for all $j\ge0$. Recall that $\mathsf{P}(X_t=k|X_0=0)=e^{-\lambda t}\tfrac{(\lambda t)^k}{k!}$ for $k\ge0$, i.e., $(X_t\mid X_0=0)\sim\mathsf{Poi}(\lambda t)$.

Example 2. The pure birth process, also known as the Yule process, $(X_t)_{t\ge0}$ in $\mathcal{I}=\{1,2,\dots\}$ describes a population of (immortal) individuals, such that in each time interval $(t,t+h)$ each individual, independently of their own past and the behaviour of other individuals, gives birth to one child with probability $\lambda h+o(h)$ (while the probability of giving birth to more than one individual is $o(h)$. Under these assumptions, \[\begin{aligned} \mathsf{P}\bigl(X_{t+h}=n\mid X_t=n\bigr)&=\mathsf{P}\bigl(\text{no births in $(t,t+h)$}\mid X_t=n\bigr) \\ &=\bigl(1-\lambda h+o(h)\bigr)^n=1-n\lambda h+o(h)\,. \end{aligned}\] Similarly, \[\begin{aligned} \mathsf{P}\bigl(X_{t+h}=n+1\mid X_t=n\bigr)&=n\bigl(\lambda h+o(h)\bigr)\,\bigl(1-\lambda h+o(h)\bigr)^{n-1} \\[.5ex] &=n\lambda h(1-\lambda h)^{n-1}+o(h)=n\lambda h+o(h)\,, \end{aligned}\] and, for $k>1$, \[\begin{aligned} \mathsf{P}\bigl(X_{t+h}=n+k\mid X_t=n\bigr)&\approx \binom{n}{k}\bigl(\lambda h+o(h)\bigr)^k\,\bigl(1-\lambda h+o(h)\bigr)^{n-k} \\ &=\binom{n}{k}(\lambda h)^k(1-\lambda h)^{n-k}+o(h)=o(h)\,. \end{aligned}\] Consequently, the corresponding $Q$-matrix $Q=(q_{i,j})_{i,j\ge1}$ has the following non-vanishing entries: $q_{j,j+1}=-q_{jj}=j\lambda$, $j\ge1$. We will see below that \[\mathsf{P}\bigl(X_t=k\mid X_0=1\bigr)=e^{-\lambda t}\bigl(1-e^{-\lambda t}\bigr)^{k-1}\,,\qquad k\ge1\,,\] so that $(X_t\mid X_0=1)\sim\mathsf{Geom}(e^{-\lambda t})$.

It is instructive to draw a diagram (directed graph) for the Markov chain under consideration, whose arrows correspond to jumps between distinct states and the arrow labels indicate the intensity for each jump. Notice that since the row sum of each $Q$-matrix vanishes, there is no need to indicate the self-arrows, related to the diagonal entries of $Q$.

We now describe exponentials of the $Q$-matrices. To this end, notice that if $A$ is an arbitrary (finite) square matrix, then \[e^{tA}:=\sum_{n\ge0}\frac{t^n}{n!}\,A^n\] is well defined for all finite $t$. Furthermore, if $A_1$ and $A_2$ commute, it is easy to check that \[e^{A_1+A_2}=e^{A_1}\,e^{A_2}\,.\] Consequently, if $P$ is a (stochastic) transition matrix as above, and $Q$ is such that $e^Q=P$, then the family $\bigl(e^{tQ}\bigr)_{t\ge0}$ interpolates between the discrete powers of $P$, i.e., $P^n=e^{nQ}$ with integer $n\ge0$.

Theorem 3. For a Markov process $X_t$, suppose the transition matrices $P_t$ admit a $Q$-matrix in that \[q_{i,j} = \lim_{h \to 0} \frac{p_{i,j}(h) - p_{i,j}(0)}{h}\] exists for every $i,j \in \mathcal{I}$. Assume also that $\mathcal{I}$ is finite. Then it holds that \[\frac{d}{dt} P_t = P_t Q = Q P_t \quad \text{for every}\; t \geq 0.\] As a result, the transition matrices $P_t$ can be written as \[P_t = e^{t Q}.\]

Proof. By the Chapman-Kolmogorov equation, \[p_{i,j}(t+h) = \sum_{k} p_{i,k}(t) p_{k,j}(h) = p_{i,j}(t) p_{j,j}(h) + \sum_{k: k \neq j} p_{i,k}(t) p_{k,j}(h).\] As a result, \[\frac{p_{i,j}(t+h)-p_{i,j}(t)}{h} = p_{i,j}(t) \frac{p_{j,j}(h)-1}{h} + \sum_{k: k \neq j} p_{i,k}(t) \frac{p_{k,j}(h)}{h}.\] In the limit at $h \to 0$, $\frac{p_{k,j}(h)}{h} \to q_{k,j}$ for $k \neq j$. Similarly, $\frac{p_{j,j}(h)-1}{h}$ tends to $q_{j,j}$ by definition. Therefore, by taking the limit in $h$ we find that \[p'_{i,j}(t) = \sum_{k} p_{i,k}(t) q_{k,j} = (P_tQ)_{i,j}.\] An analogous calculation, whereby we write $p_{i,j}(t+h) = \sum_k p_{i,k}(h) p_{k,j}(t)$, shows that $p'_{i,j}(t) = (QP_t)_{i,j}$.

The differential equation $\frac{d}{dt} P_t = Q P_t$ has a unique solution given by $P_t = e^{tQ}$ as shown in the theorem below. ◻

Theorem 4. Given $Q=(q_{ij})_{i,j\in \mathcal{I}}$ with finite $\mathcal{I}$, define $P_t = e^{tQ}$. Then the family of matrices $\bigl(P_t\bigr)_{t\ge0}$ satisfies
1) The semigroup property, $P_{s+t}=P_s\,P_t$ for all $s$, $t\ge0$.
2) The (matrix-valued) function $(P_t)_{t\ge0}$ is the unique solution to the forward Kolmogorov equation: \[\label{eq:ctMC-forward-Kolmogorov-eqn} \frac{d}{dt}P_t=P_t\,Q\,,\qquad P_0=I\,.\]
3) The function $(P_t)_{t\ge0}$ is the unique solution to the backward Kolmogorov equation: \[\label{eq:ctMC-backward-Kolmogorov-eqn} \frac{d}{dt}P_t=Q\,P_t\,,\qquad P_0=I\,.\]
4) For all integer $k\ge0$, \[\Bigl(\frac{d}{dt}\Bigr)^k\,P_t\bigm|_{t=0}=Q^k\,.\]

Proof. Since $sQ$ and $tQ$ commute, we get $P_{s+t}=P_s\,P_t$. As the radius of convergence of the Taylor expansion of $P_t$ is infinite, the series can be differentiated term-wise, \[\frac{d}{dt}P_t=\sum_{n\ge1}\frac{t^{n-1}}{(n-1)!}\,Q^n=P_t\,Q=Q\,P_t\,;\] as a result, $\bigl(\frac{d}{dt}\bigr)^k\,P_t\bigm|_{t=0}=Q^k$ for all integer $k\ge0$. It thus remains to check uniqueness of solutions to [eq:ctMC-forward-Kolmogorov-eqn]-[eq:ctMC-backward-Kolmogorov-eqn].

If $R_t$ were another solution to [eq:ctMC-forward-Kolmogorov-eqn], we would have \[\frac{d}{dt}\bigr(R_te^{-tQ}\bigr)=\bigl(R_t'-R_tQ\bigr)\,e^{-tQ}\equiv 0\] so that $R_t\equiv R_0e^{tQ}\equiv P_t$ for all $t\ge0$. Uniqueness of solution to the backward equation [eq:ctMC-backward-Kolmogorov-eqn] follows similarly. ◻

We now apply the above result to $Q$-matrices. Recall that $P=\bigl(p_{ij}\bigr)_{i,j\in \mathcal{I}}$ is a stochastic matrix, if each row of $P$ is a probability distribution in $\mathcal{I}$, namely: 1) $0\le p_{ij}$ for all $i$, $j\in \mathcal{I}$; 2) $\sum_{j\in \mathcal{I}}p_{ij}=1$ for all $i\in \mathcal{I}$.

Proof. As $P_h=I+hQ+o(h)$ in the limit of small $h\searrow0$, for all $h>0$ small enough and all $i\neq j$, the conditions $q_{ij}\ge0$ and $p_{ij}(h)\ge0$ are equivalent. Consequently, for $h>0$ small enough all entries of $P_h$ are non-negative. As a result, for every fixed $t\ge0$, there is an integer $n>0$ so that $P_{t/n}$ has only non-negative entries, and by the Chapman-Kolmogorov equation, so does $P_t=\bigl(P_{t/n}\bigr)^n$. In particular, for $i\neq j$, the conditions $q_{ij}\ge0$ and $p_{ij}(t)\ge0$ for all $t\ge0$ are equivalent.

Next, if all row sums of the matrix $Q$ vanish, the same holds for its every power, $Q^n=\bigl(q_{ij}^{(n)}\bigr)_{i,j\in \mathcal{I}}$; this follows by a straightforward induction based upon the relation \[\sum_kq_{ik}^{(n)}=\sum_k\sum_jq_{ij}^{(n-1)}\,q_{jk}=\sum_jq_{ij}^{(n-1)}\Bigl(\sum_kq_{jk}\Bigr)=0\] The Taylor expansion then implies \[\sum_jp_{ij}(t)=1+\sum_{n\ge1}\frac{t^n}{n!}\sum_jq_{ij}^{(n)}=1\] for all $i\in \mathcal{I}$. Conversely, if $\sum_jp_{ij}(t)\equiv1$ for all $t\ge0$ and all $i\in \mathcal{I}$, then \[\sum_jq_{ij}=\frac{d}{dt}\Bigm|_{t=0_+}\sum_jp_{ij}(t)=0\,,\] as claimed. ◻

Based on the previous theorem we find that a Markov process $X_t$ is completely determined by its $Q$-matrix by the relation $P_t = e^{tQ}$. We would like to calculate the transition probabilities based on this formula. There are two ways to do this: one is by diagonalizing $Q$ and the other method is by using resolvents. The following examples illustrate these two approaches.

Example 6. Consider a Markov chain on $\mathcal{I}=\{A,B,C\}$ with the $Q$-matrix \[Q=\begin{pmatrix} -3 & 1 & 2 \\ 0 & -1 & 1 \\ 1 & 1 & -2 \end{pmatrix}.\] To find the transition probability $p_{CC}(t)$, we argue similarly to the discrete time chains. First, we find the characteristic polynomial of $Q$: \[\det\bigl(\lambda I-Q\bigr)=\det\begin{vmatrix} \lambda+3 & -1 & -2 \\ 0 & \lambda+1 & -1 \\ -1 & -1 & \lambda+2 \end{vmatrix}=\lambda(\lambda+2)(\lambda+4)\,,\] so that the eigenvalues are ¹ $0$, $-2$ and $-4$. Therefore, $Q$ can be diagonalised, and so each entry of $e^{tQ}$ is a linear combination of $e^{0\cdot t}$, $e^{-2t}$ and $e^{-4t}$; in other words, for some unknown parameters $a$, $b$, and $c$, we have \[p_{CC}(t)=a+be^{-2t}+ce^{-4t}.\] To find $a$, $b$, and $c$, we solve the following linear system: \[\begin{aligned} 1&=p_{CC}(0)=a+b+c\,,\\ -2&=q_{CC}=p'_{CC}(0)=-2b-4c\,,\\ 7&=q_{CC}^{(2)}=p''_{CC}(0)=4b+16c\,, \end{aligned}\] and finally get $p_{CC}(t)=\tfrac38+\tfrac14e^{-2t}+\tfrac38e^{-4t}$.

Resolvents

An alternative approach to finding transition probabilities uses the Laplace transform.

Laplace transform

A continuous function $f:[0,\infty)\to\mathbb{R}$ is good if it is either bounded (for all $t\ge0$ we have $|f(t)|\le K$ with some finite constant $K$) or integrable (that is, $\int_0^\infty|f(t)|\,dt<\infty$) or both. If $f$ is good, its Laplace transform $\hat{f}:(0,\infty)\to\mathbb{R}$ is defined via \[\hat{f}(\lambda):=\int_0^\infty e^{-\lambda t}f(t)\,dt\,.\] An important example here is $f(t)=e^{-ct}$; then \[\label{eq:ctMC-Laplace-transform-exponential-function} \hat{f}(\lambda)=\int_0^\infty e^{-\lambda t}e^{-ct}\,dt=\frac1{\lambda+c}\,.\] As with other useful transforms in mathematics, the map between good functions and their Laplace transforms is one-to-one:

This implies that we can invert Laplace transforms uniquely, at least when restricting attention to good functions. In particular, \[\label{eq:ctMC-Laplace-transform-correspondence-for-exponential-functions} f(t)=e^{-ct}\quad\text{ iff }\quad\hat{f}(\lambda)=\tfrac1{\lambda+c}\,.\]

Resolvents of $Q$-matrices

To find the matrix $P_t=e^{tQ}$ of transition probabilities it is often easier to first directly calculate its Laplace transform, the resolvent, \[\label{eq:ctMC-resolvent-def} R_\lambda=R(\lambda):=\int_0^\infty e^{-\lambda t}P_t\,dt\,,\] and then invert the obtained expression for each entry of interest using the correspondence [eq:ctMC-Laplace-transform-correspondence-for-exponential-functions].

To motivate the result, let $A$ be a finite diagonal matrix and $B_t:=e^{tA}$. Then [eq:ctMC-Laplace-transform-exponential-function] applies to each diagonal entry of $B_t$ so that \[\int_0^\infty e^{-\lambda t}B_t\,dt=\int_0^\infty e^{-\lambda t}e^{tA}\,dt=\bigl(\lambda I-A\bigr)^{-1}\,.\] Consequently, at least in the situation when the $Q$-matrix $Q$ of a finite Markov chain can be diagonalised, ² we get \[\label{eq:ctMC-resolvent-function} R_\lambda=\int_0^\infty e^{-\lambda t}P_t\,dt=\int_0^\infty e^{-\lambda t}e^{tQ}\,dt=\bigl(\lambda I-Q\bigr)^{-1}\,;\] of course, this only makes sense if $\lambda$ is not an eigenvalue of $Q$. However, since the Laplace transform applies to individual entries of $P_t$, this implies that the resolvent $R_\lambda=\bigl(r_{ij}(\lambda)\bigr)_{i,j\in \mathcal{I}}$ satisfies \[r_{ij}(\lambda):=\int_0^\infty e^{-\lambda t}p_{ij}(t)\,dt=\hat{p}_{ij}(\lambda)\,.\]

To summarize, knowing $Q$ one can compute individual transition probabilities in $P_t$ as follows:
) find the eigenvalues of $Q$ by using the characteristic polynomial $\det(\lambda I-Q)$;
2) find the entry of interest, say $r_{ij}(\lambda)$, of $R_\lambda=(\lambda I-Q)^{-1}$ by using the standard linear algebra formula, \[r_{ij}(\lambda)=\bigl(R_\lambda\bigr)_{ij}=\frac{(-1)^{i+j}}{\det(\lambda I-Q)}\det(M_{ji})\,,\] where $M_{ji}$ is obtained from $\lambda I-Q$ by removing row $j$ and column $i$;
3) rewrite $r_{ij}(\lambda)$ as as a sum of partial fractions, and use [eq:ctMC-Laplace-transform-correspondence-for-exponential-functions] to invert individual terms.

Example 8. In Example Example 6 we had $\det(\lambda I-Q)=\lambda(\lambda+2)(\lambda+4)$, so that \[r_{33}(\lambda)=\frac{1}{\lambda(\lambda+2)(\lambda+4)}\det\begin{vmatrix} \lambda+3 & -1 \\ 0 & \lambda+1 \end{vmatrix}=\frac{(\lambda+1)(\lambda+3)}{\lambda(\lambda+2)(\lambda+4)}\,.\] Writing this as a sum of partial fractions, \[r_{33}(\lambda)=\frac{a}{\lambda}+\frac{b}{\lambda+2}+\frac{c}{\lambda+4}\,,\] we find the unknown parameters $a$, $b$, and $c$ by matching coefficients ³ in \[a(\lambda+2)(\lambda+4)+b\lambda(\lambda+4)+c\lambda(\lambda+2)=\lambda^2+4\lambda+3\] to obtain $r_{33}=\tfrac3{8\lambda}+\tfrac1{4(\lambda+2)}+\tfrac3{8(\lambda+4)}$, which by the correspondence relation [eq:ctMC-Laplace-transform-correspondence-for-exponential-functions] implies $p_{33}(t)=\tfrac38+\tfrac14e^{-2t}+\tfrac38e^{-4t}$, as before.

Resolvents also have a probabilistic interpretation: suppose that we have an alarm-clock which rings, independently of the Markov chain $X_t$, at a random time $\xi\sim\mathsf{Exp}(\mu)$. Then by the formula of total probability, \[\mathsf{P}_i(X_\xi=j)=\int_0^\infty \mu e^{-\mu t}\,p_{ij}(t)\,dt=\mu\, r_{ij}(\mu)\,,\] in other words, $r_{ij}(\lambda)=\hat{p}_{ij}(\lambda)=\tfrac1\lambda \mathsf{P}_i(X_\xi=j)$, where $\xi\sim\mathsf{Exp}(\lambda)$ is an alarm-clock independent of $X_t$.

Example 9. Let $X_t$ be a Markov chain with state space $\{A,B,C\}$ and the $Q$-matrix \[Q=\begin{pmatrix} -3 & 1 & 2 \\ 0 & -2 & 2 \\ 2 & 1 & -3 \end{pmatrix}\,.\] If $T\sim\mathsf{Exp}(4)$ is independent of $X_t$, find $\mathsf{P}_C(X_T=B)$.

Solution. 1.

First hitting times

Let $T_j:=\inf\{t>0:X_t=j\}$ be the first time we are in state $j$. The random variable $T_j$ is an example of a stopping time (similarly to the martingale theory above, $T$ is a stopping time for the Markov chain $(X_t)_{t\ge0}$, if for each $t\ge0$ the event $\{T\le t\}$ depends only on the trajectory $(X_s)_{0\le s\le t}$). Define, for $i$, $j\in \mathcal{I}$, \[F_{ij}(t)=\mathsf{P}_i(T_j\le t)\equiv\mathsf{P}(T_j\le t\mid X_0=i)\,,\] the probability that the chain $X_t$ started from $X_0=i$ has visited state $j$ by time $t$. Write $f_{ij}(t)$ for the corresponding probability density function, so that $F_{ij}(t)=\int_0^tf_{ij}(s)\,ds$. It might be instructive to think of the matrices $(F_{ij}(t))_{i,j\in \mathcal{I},t\ge0}$ and $(f_{ij}(t))_{i,j\in \mathcal{I},t\ge0}$.

Notice that, as in the discrete time case, \[\label{eq:ctMC-first-hitting-time-convolution} p_{ij}(t)=\int_0^tf_{ij}(s)\,p_{jj}(t-s)\,ds\,,\] which is just the formula of total probability combined with the strong Markov property, the fact that the chain starts afresh at time $T_j$ in state $j$ and evolves independently of everything that happened before time $T_j$:

Strong Markov property: If $T$ is a stopping time for a continuous time Markov chain $(X_t)_{t\ge0}$ then $(X_{T+t}\mid T<\infty,X_T=i)_{t\ge0}$ is independent of $(X_s)_{0\le s\le T}$ and has the same distribution as $(X_t\mid X_0=i)_{t\ge0}$.

The right-hand side of [eq:ctMC-first-hitting-time-convolution] is the convolution $f_{ij}\star p_{jj}(\lambda)$, where, in analogy to the discrete case, given two functions $f$, $g:[0,\infty)\to\mathbb{R}$, their convolution is defined via $f\star g(t):=\int_0^tf(s)g(t-s)\,ds$. We have

Proof. By a straightforward integration, the left-hand side of [eq:ctMC-convolution-statement] equals \[\int_0^\infty\Bigl(\int_0^tf(s)g(t-s)\,ds\Bigr)e^{-\lambda t}\,dt=\int_0^\infty\Bigl(f(s)e^{-\lambda s}\int_s^\infty g(t-s)e^{-\lambda(t-s)}\,dt\Bigr)ds\,,\] which is just $\hat{f}(\lambda)\,\hat{g}(\lambda)$. ◻

Applying this result to the convolution in [eq:ctMC-first-hitting-time-convolution], and re-arranging, we deduce \[\label{eq:ctMC-first-hitting-time-Laplace-transform-formula} \hat{f}_{ij}(\lambda)=\frac{r_{ij}(\lambda)}{r_{jj}(\lambda)}\,,\qquad i,j\in \mathcal{I}\,, \quad \lambda>0\,.\]

Example 11. In the setting of Example Example 9, we get \[\hat{f}_{CA}(\lambda)=\frac{r_{CA}(\lambda)}{r_{AA}(\lambda)}=\frac{\det\begin{pmatrix} 0 & \lambda+2 \\ -2 & -1 \end{pmatrix}}{\det\begin{pmatrix} \lambda+2 & -2 \\ -1 & \lambda+3 \end{pmatrix}}=\frac{2(\lambda+2)}{\lambda^2+5\lambda+4}\,.\] Consequently, \[\hat{f}_{CA}(\lambda)=\frac{2}{3(\lambda+1)}+\frac{4}{3(\lambda+4)}\quad\Longrightarrow\quad f_{CA}(t)=\frac23e^{-t}+\frac43e^{-4t}\,,\quad t\ge0\,.\] As a result, $\mathsf{P}_C(T_A\le t)=\int_0^tf_{CA}(s)\,ds=1-\tfrac23e^{-t}-\tfrac13e^{-4t}$ for all $t\ge0$.

As in the case of $r_{ij}(\lambda)=\hat{p}_{ij}(\lambda)$, the Laplace transform $\hat{f}_{ij}(\lambda)$ also admits a direct probabilistic interpretation: suppose, as before, that we have an alarm-clock which rings, independently of the Markov chain, at a random time $\xi\sim\mathsf{Exp}(\mu)$. Then \[\begin{aligned} \mathsf{P}_i(T_j\le \xi)&=\int_0^\infty\mathsf{P}_i(T_j\le t)\mu e^{-\mu t}\,dt=\int_0^\infty\Bigl(\int_0^tf_{ij}(s)\,ds\Bigr)\mu e^{-\mu t}\,dt \\[.5ex] &=\int_0^\infty f_{ij}(s)\Bigl(\int_s^\infty \mu e^{-\mu t}\,dt\Bigr)\,ds=\int_0^\infty f_{ij}(s)e^{-\mu s}\,ds=\hat{f}_{ij}(\mu)\,, \end{aligned}\] so that $\hat{f}_{ij}(\lambda)$ is just the probability $\mathsf{P}_i(T_j\le \xi)$, where $\xi\sim\mathsf{Exp}(\lambda)$ is an alarm-clock, independent of the Markov chain $X_t$.

Example 12. In the setting of Example Example 9, let $U\sim\mathsf{Exp}(2.5)$ be an alarm-clock independent of the Markov chain $X_t$. Then \[\mathsf{P}_B(T_C\le U)=\hat{f}_{BC}(\mu)=\frac{r_{BC}(\mu)}{r_{CC}(\mu)}=\frac{-\det\begin{pmatrix} \mu+3 & -2 \\ 0 & -2 \end{pmatrix}}{\det\begin{pmatrix} \mu+3 & -1 \\ 0 & \mu+2 \end{pmatrix}}=\frac{2}{\mu+2}\,,\] and using $\mu=2.5$, we deduce $\mathsf{P}_B(T_C\le U)=\tfrac49$.

The resolvent approach for infinite state spaces

In addition to the Kolmogorov forward equation approach used in [exse:ctMC-Poisson-process] and [exse:ctMC-pure-birth-process], one can also use the resolvent method to obtain the distribution of $(X_t|X_0=1)$. We describe it for the pure birth process from Example Example 2.

Example 13. Since the $Q$-matrix of the pure birth process from Example Example 2 has a two-diagonal structure, the same is true for $A:=\rho I-Q$. The only non-vanishing entries of this matrix are \[a_{ii}=\rho+i\lambda\,,\qquad a_{i,i+1}=-i\lambda\,,\quad i\ge1\,.\] The entries $r_{1n}$, $n\ge1$, of the inverse matrix $\bigl(\rho I-Q\bigr)^{-1}$ therefore satisfy \[r_{11}(\rho+\lambda)=1\quad\text{ and }\quad r_{1,n}(\rho+n\lambda)-r_{1,n-1}(n-1)\lambda=0\,,\quad n\ge2\,,\] which implies \[r_{1n}=\frac{(n-1)\lambda}{\rho+n\lambda}r_{1,n-1}=\lambda^{n-1}(n-1)!\prod_{k=1}^n\frac{1}{\rho+k\lambda}\,.\] Using partial fractions, this can be written as \[r_{1n}=\sum_{k=1}^n\frac{\alpha_k}{\rho+k\lambda}\,,\] where the unknown coefficients $\alpha_k$ satisfy \[\lambda^{n-1}(n-1)!=\sum_{k=1}^n\alpha_k\prod_{j\neq k}(\rho+j\lambda)\,.\] By following the same approach as in the lectures, let $\rho=-k\lambda$ to get \[\lambda^{n-1}(n-1)!=\alpha_k\lambda^{n-1}\prod_{j\neq k}(j-k)=\alpha_k\lambda^{n-1}(-1)^{k-1}(k-1)!(n-k)!\,,\] and therefore \[\alpha_k=\binom{n-1}{k-1}(-1)^{k-1}\,.\] Finally, by inversion, the sum \[r_{1n}=\sum_{k=1}^n\binom{n-1}{k-1}(-1)^{k-1}\frac{1}{\rho+k\lambda}\] becomes \[p_{1n}(t)=\sum_{k=1}^n\binom{n-1}{k-1}(-1)^{k-1}e^{-k\lambda t}\equiv e^{-\lambda t}\sum_{k=0}^{n-1}\binom{n-1}{k}(-1)^{k}e^{-k\lambda t}\,,\] which is just $p_{1n}(t)=e^{-\lambda t}(1-e^{-\lambda t})^{n-1}$, as before.

Long-term behaviour and invariant distribution

We now look at the long-term behaviour of continuous time Markov chains. Our assumptions will be similar to the discrete time case, namely, throughout this section we shall assume that the Markov chain under consideration is

The invariant distribution

In the discrete time case a measure $\boldsymbol{\pi}$ is a stationary distribution of a Markov chain on $\mathcal{I}$ with transition matrix $P$, if $\boldsymbol{\pi}=\boldsymbol{\pi }P$, and therefore $\boldsymbol{\pi}=\boldsymbol{\pi }P^n$, for all integer $n\ge0$. Similarly, $\boldsymbol{\pi}$ is an invariant distribution of a continuous time Markov chain with transition matrix function $(P_t)_{t\ge0}$, if \[\label{eq:ctMC-invariant-distribution-via-transition-function} \boldsymbol{\pi }P_t=\boldsymbol{\pi}\qquad\text{ for all $t\ge0$\,.}\] If the last equation can be differentiated with respect to time $t$ (e.g., when the state space $\mathcal{I}$ is finite), then $\boldsymbol{\pi }P'_t=\boldsymbol{0}$, which at $t=0$ becomes \[\label{eq:ctMC-invariant-distribution-via-Qmatrix} \boldsymbol{\pi }Q=\boldsymbol{0}\,.\] In some cases (e.g., when $|\mathcal{I}|<\infty$) one can show that the last two conditions are equivalent.

Theorem 14. Suppose $(X_t)_{t\ge0}$ satisfies the conditions above and $\boldsymbol{\pi}$ is a probability distribution which solves the equations \[\sum_{i\in \mathcal{I}}\pi_i\,q_{ij}=0\qquad\text{ for all $j\in \mathcal{I}$\,.}\] Then \[\lim_{t\to\infty}p_{ij}(t)=\pi_j\qquad\text{ for all $i$, $j\in \mathcal{I}$\,.}\] Furthermore, if $V_j(t)=\int_0^t\mathbf{1}_{X_s=j}\,ds$ is the total time spent by $X_s$ in state $j$ up to time $t$, we have \[\mathsf{P}\bigl(\lim_{t\to\infty}\tfrac1tV_j(t)=\pi_j\bigr)=1\,.\]

Notice that the theorem implies that the invariant distribution $\boldsymbol{\pi}$ is unique and does not depend on the initial state (or distribution) of the Markov chain.

Convergence to the invariant distribution

We will prove that a Markov process converges to its invariant distribution $\boldsymbol{\pi}$ under a suitable irreducibility assumption.

A Markov process $X_t$ with a $Q$-matrix is irreducible if for every pair of states $i, j \in \mathcal{I}$, there are a finite number of states $i = i_1, i_2, \ldots, i_n = j$ such that $q_{i_k,i_{k+1}} > 0$ for every $k = 1, 2, \ldots, n-1$.

Theorem 15. Suppose a Markov process $X_t$ with a $Q$-matrix is irreducible in the above sense and has an invariant distribution $\boldsymbol{\pi}$. Then $\boldsymbol{\pi}$ is its unique invariant distribution and $p_{i,j}(t) \to \pi_j$ as $t \to \infty$ for every pair $i,j \in \mathcal{I}$.

The proof of this theorem will be deduced from the convergence theorem for discrete time Markov chains. Recall that a discrete time Markov chain $X_0, X_1, X_2, \ldots$ with transition matrix $P$ converges to its unqiue invariant distribution $\boldsymbol{\pi}$ if it is irreducible and aperiodic.

Proof. The Markov process exits the state $i$ at rate $-q_{i,i}$. So the probability that it stays at state $i$ for $t$ units of time is at least that an $\mathrm{Exp}(-q_{i,i})$ random variable is larger than $t$. The latter probability is $e^{q_{i,i}t}$. ◻

Proof. If $i=j$ then $p_{i,i}(t) \geq e^{q_{i,i}t} > 0$ for every $t$ by Lemma Lemma 16. So suppose $i \neq j$. Then $p_{i,j}(t+s) \geq p_{i,j}(t) p_{j,j}(s) \geq p_{i,j}(t) e^{-q_{j,j}t}$. So it is enough to show that $p_{i,j}(t) > 0$ for all sufficiently small values of $t$.

By irreducibility, there are states $i = i_1 , i_2, \ldots, i_n = j$ such that $q_{i_k,i_{k+1}} > 0$. Note that $\lim_{t \to 0} p_{i_k, i_{k+1}}(t)/ t = q_{i_k, i_{k+1}} > 0$, so that $p_{i_k, i_{k+1}}(t) > 0$ for all small values of $t$. Now, \[p_{i,j}(t) \geq p_{i_1,i_2}(t/n) p_{i_2,i_3}(t/n) \cdots p_{i_{n-1}, i_n}(t/n).\] The terms on the right hand side are positive for all small values of $t$ by the above observation. So $p_{i,j}(t)$ is also positive for all small values of $t$ as required. ◻

Proof. First we show that the invariant distribution $\boldsymbol{\pi}$ of $P_t$ is unique. For $h > 0$ consider the discrete time Markov chain with transition matrix $P = P_h$, that is, $P_{i,j} = p_{i,j}(h)$. By Lemma Lemma 17, the entries of $P$ are all positive, so the chain in irreducible and aperiodic. Thus it has a unique invariant distribution. But note that $\boldsymbol{\pi}$ is an invariant distribution of $P$ since $\boldsymbol{\pi }P = \boldsymbol{\pi }P_h = \boldsymbol{\pi}$. So $\boldsymbol{\pi}$ must be the unique invariant distribution of $P_t$.

Now, with $P = P_h$ as before, we can use the convergence theorem for discrete time Markov chains. We find that \[P^n_{i,j} = p_{i,j}(nh) \to \pi_j\] as $n \to \infty$ for every $i,j$.

For an arbitrary time $t > 0$ we can find $n$ such that $nh \leq t < (n+1)h$. Observe that $p_{i,j}(t) \geq p_{i,j}(nh) p_{j,j}(t-nh)$, and $p_{j,j}(t-nh) \geq e^{q_{j,j}(t-nh)} \geq e^{q_{j,j}h}$ by Lemma Lemma 16. So \[p_{i,j}(t) \geq e^{q_{j,j}h} p_{i,j}(nh).\] In the same way, $p_{i,j}((n+1)h) \geq p_{i,j}(t) e^{q_{j,j}h}$. Together these bound imply that \[p_{i,j}(nh) e^{q_{j,j}h} \leq p_{i,j}(t) \leq p_{i,j}((n+1)h)e^{- q_{j,j}h}.\] As $t \to \infty$, $n$ tends to $\infty$ as well. Thus, by taking limits above we find that \[\pi_j e^{q_{j,j}h} \leq \lim_{t \to \infty} p_{i,j}(t) \leq \pi_j e^{-q_{j,j}h}.\] Now we let $h \to 0$ to get that $p_{i,j}(t) \to \pi_j$. ◻

Detailed balance

In analogy to the discrete time case, a measure $\boldsymbol{m}=(m_i)_{i\in \mathcal{I}}$ is in detailed balance with the $Q$-matrix $Q$ of a continuous time Markov chain, if \[\label{eq:ctMC-detailed-balance-equations} m_i\,q_{ij} = m_j\,q_{ji}\qquad\text{ for all $i$, $j\in \mathcal{I}$\,.}\]

Proof. For fixed $j\in \mathcal{I}$, \[(\boldsymbol{\pi }Q)_j=\sum_i\pi_iq_{ij}=\frac1M\sum_im_iq_{ij}=\frac{1}{M}\sum_im_jq_{ji}=\frac{m_j}{M}\sum_iq_{ji}=0\,,\] so that $\boldsymbol{\pi}$ is invariant for $Q$. ◻

Notice that, as in the discrete case, the invariant distribution might exist even if the detailed balance equations [eq:ctMC-detailed-balance-equations] do not hold.

Example 19. The M/M/1 queue, describing a single server with service rate $\mu$ and arrival rate $\lambda$, is a continuous time Markov chain with state space $\mathcal{I}=\{0,1,\dots\}$ and $Q$-matrix $Q$, whose non-vanishing entries are as follows: \[q_{i,i+1}=\lambda\,,\qquad q_{i+1,i}=\mu\,,\qquad q_{ii}=-(\lambda+\mu)+\mu\delta_{0i}\,,\qquad i\in \mathcal{I}\,.\] The detailed balance equations here read $m_i\lambda=m_{i+1}\mu$, which by a straightforward induction implies $m_i=\rho^i m_0$, where $\rho=\tfrac\lambda\mu$ denotes the traffic intensity. If $\rho<1$, we have \[M=\sum_im_i=m_0\sum_{i\ge0}\rho^i=\frac{m_0}{1-\rho}<\infty\,,\] implying that $\pi_i=(1-\rho)\rho^i$, $i\ge0$, a shifted geometric distribution with success probability $1-\rho$. A straightforward computation shows that the average queue length in stationarity is $\sum_i i\pi_i=\tfrac1{1-\rho}-1=\tfrac\rho{1-\rho}$.

Example 20. As in the discrete time situation, the birth and death processes are defined in $\mathcal{I}=\{0,1,2,\dots\}$ and jump between neighbouring integers. The corresponding $Q$-matrix has a three-diagonal structure, and its non-vanishing entries are, for $n\in \mathcal{I}$, \[q_{n,n-1}=\mu_n\,\mathbf{1}_{n>0}\,,\qquad q_{n,n}=-(\lambda_n+\mu_n)\,,\qquad q_{n,n+1}=\lambda_n\,.\] Here, $q_{n,n+1}$ is the birth rate and $q_{n,n-1}$ is the death rate. To find the invariant distribution, notice that, by detailed balance, $\pi_n=\tfrac{\lambda_{n-1}}{\mu_n}\pi_{n-1}$ and therefore \[\pi_n=\pi_0\prod_{k=1}^n\frac{\lambda_{k-1}}{\mu_k}\,,\qquad n\ge0\,.\] This can be normalised into a probability measure if there exists $\pi_0>0$ for which \[\sum_{n\ge0}\pi_n=\pi_0\Bigl(\sum_{n\ge0}\prod_{k=1}^n\frac{\lambda_{k-1}}{\mu_k}\Bigr)=1\,,\] equivalently, when the factor of $\pi_0$ is finite.

Remark 2. Many of the processes introduced above are birth-and-death processes; these include the Poisson process, the pure birth process, and various Markovian queues. It is an instructive exercise to describe all values of their parameters for which the process has an invariant distribution.

Remark 3. A similar method can be used to characterise birth-and-death processes into transient and recurrent. Since the process is irreducible and each state $n\ge0$ can be reached from state $0$ with probability one, for recurrence it is sufficient to verify that for all $n\ge0$ \[a_n:=\mathsf{P}_n(T_0<\infty)=1\,,\] where, as before, $T_j$ is the first hitting time of state $j$. One can show that the birth-and-death chain is recurrent iff $\sum_{n=1}^\infty\prod_{k=1}^n\tfrac{\mu_k}{\lambda_k}=\infty$. For details, see [exse:recurrence-of-birth-and-death-processes].
It is an instructive exercise to apply this condition to various birth-and-death processes discussed in this section.

Example 21. Since the jump rates of the M/M/1 queue from Example Example 19 have the same structure for all states sufficiently away from the origin, one can find the stationary distribution $\boldsymbol{\pi}=(\pi_0,\pi_1,\dots)$ using its generating function, \[\tilde\pi(s):=\sum_{n\ge0}\pi_ns^n\,.\]

Namely, notice that the linear system $\boldsymbol{\pi }Q=\boldsymbol{0}$, where $Q$ is the $Q$-matrix of the M/M/1 queue, reads \[-\lambda\pi_0+\mu\pi_1=0\,, \qquad \lambda\pi_{n-2}-(\lambda+\mu)\pi_{n-1}+\mu\pi_n=0,\quad n\ge2\,.\] Multiplying the $n^\text{th}$ equation by $s^n$ and summing, we get \[\lambda s^2\tilde\pi(s)-\lambda\pi_0s-(\lambda+\mu)s(\tilde\pi(s)-\pi_0)+\mu(\tilde\pi(s)-\pi_0)=0\,,\] so that \[\tilde\pi(s)=\frac{\mu\pi_0(1-s)}{\mu-(\lambda+\mu)s+\lambda s^2}=\frac{\mu\pi_0}{\mu-\lambda s}\,.\] To find $\pi_0$, recall that $\boldsymbol{\pi}$ is a stochastic vector, so that $1=\tilde\pi(1)=\mu\pi_0/(\mu-\lambda)$, and therefore, in terms of $\rho=\lambda/\mu$, \[\tilde\pi(s)=\frac{\mu-\lambda}{\mu-\lambda s}=(1-\rho)\sum_{n\ge0}\rho^ns^n\,.\] If $\rho<1$, this gives $\pi_n=(1-\rho)\rho^n$, $n\ge0$, as in Example Example 19. The average queue length in equilibrium can be found from a direct differentiation, \[\sum_{n\ge0}n\pi_n=\tilde\pi'(s)\Bigm|_{s=1}=\frac{\rho(1-\rho)}{(1-\rho s)^2}\Bigm|_{s=1}=\frac{\rho}{1-\rho}\,,\] agreeing with the result in Example Example 19.

Remark 4. Recall that the M/M/$k$ queue describes a supermarket with $k\ge1$ cashiers. It has arrival rate $\lambda$, while its service rate is $\mu n$ if $n\le k$ and $\mu k$ for $n>k$. It is instructive to use the detailed balance equations to find the stationary distribution for this Markov chain, see [exse:ctMC-MMk-queue]; what is its condition of stability? What is the stationary distribution of the M/M/$\infty$ queue?

Problems

Suppose there exists a Markov chain $(X_t)_{t\ge0}$ with state space $\mathcal{I}=\{1,2\}$ such that, for all $t>0$, $p_{12}(t)=\mathsf{P}(X_t=2\mid X_0=1)=\tfrac12-\tfrac12\,e^{-2t}$ and $p_{21}(t)=\mathsf{P}(X_t=1\mid X_0=2)=\tfrac12-\tfrac12\,e^{-2t}$. Find the corresponding $Q$-matrix and the $P_t$ matrix.

Notice that $p_{11}(t)\equiv1-p_{12}(t)$, similarly, $p_{22}(t)\equiv1-p_{21}(t)$ for all $t\ge0$. Recalling that $q_{ij}=\tfrac{d}{dt}p_{ij}(t)|_{t=0}$, we get \[P_t=\begin{pmatrix} \tfrac12+\tfrac12\,e^{-2t} & \tfrac12-\tfrac12\,e^{-2t} \\[.4ex] \tfrac12-\tfrac12\,e^{-2t} & \tfrac12+\tfrac12\,e^{-2t} \end{pmatrix}\,, \qquad Q=\begin{pmatrix} -1 & 1 \\ 1 & -1 \end{pmatrix}\,.\]

Show that there is no Markov chain $(X_t)_{t\ge0}$ with state space $\mathcal{I}=\{1,2\}$ such that, for all $t>0$, $p_{12}(t)=\mathsf{P}(X_t=2\mid X_0=1)=\tfrac12-\tfrac12\,e^{-2t}$ and $p_{22}(t)=\mathsf{P}(X_t=2\mid X_0=2)=\tfrac12-\tfrac12\,e^{-2t}$.
[Hint: write the corresponding $P_t$ matrix.]

By construction, we must have $P_t|_{t=0}=I$, the identity matrix. Notice that here, however, $p_{22}(0)=0$, a contradiction.

In a factory, there are two machines. Each machine will run for an exponentially distributed time with parameter $2$ before needing repair. There are two repairmen, but only one at a time can work on a broken machine. Repair times are exponential rate $3$. Let $X(t)$ denote the number of machines working at time $t$.
a) Write down the $Q$ matrix for this Markov chain.
b) At time $0$, both machines are working. What is the probability that they remain working throughout the time interval $(0,t)$?
c) At time $0$, both machines are broken down. What is the probability that at time $t$ both machines are working properly?

Let $X_t$ be the number of machines working at time $t$, $X_t\in\{0,1,2\}$.
a) The non-vanishing off-diagonal entries of the $Q$-matrix are: $q_{01}=6$, $q_{10}=2$, $q_{12}=3$ and $q_{21}=4$, so that \[Q=\begin{pmatrix} -6 & 6 & 0 \\ 2 & -5 & 3 \\ 0 & 4 & -4 \end{pmatrix}\,.\] b) Let $T_1\sim\mathsf{Exp}(4)$ be the first jump time out of state $2$. Then $\mathsf{P}_2(X_s=2\text{ for all $s \in [0,t]$})=\mathsf{P}_2(T_1>t)=e^{-4t}$.
c) By a straightforward computation, $\det(\lambda I-Q)=\lambda(\lambda+5)(\lambda+10)$, so that $p_{02}(t)=a+be^{-5t}+ce^{-10t}$ for some unknown constants $a$, $b$, $c$. To find the latter, solve the linear system $p_{02}(0)=0=a+b+c$, $p'_{02}(t)=q_{02}=0=-5b-10c$, $p''_{02}(0)=q_{02}^{(2)}=18=25b+100c$, implying that $a=c=\tfrac9{25}$, $b=-\tfrac{18}{25}$, i.e., $p_{02}(t)=\tfrac9{25}-\tfrac{18}{25}e^{-5t}+\tfrac9{25}e^{-10t}$.
Remark: you might wish to check that each $\lambda\in\{0,-5,-10\}$ turns $\lambda I-Q$ into a degenerated matrix, ie., is an eigenvalue of $Q$. Also, that the trace condition $\sum_i\lambda_i=\sum_jq_{jj}$ holds. Further, that $a$ coincides with the $\pi_2$ component to the stochastic vector solution $\boldsymbol{\pi}$ to the equation $\boldsymbol{\pi }Q=\boldsymbol{0}$.

If $X_t$ is the Poisson process from Example Example 1, let $p_{jk}(t)$ be the transition probability from $j$ to $k\in \mathcal{I}=\{0,1,2,\dots\}$. Write the corresponding $Q$-matrix and the Kolmogorov forward equation for $p_{0k}(t)$. By using induction or otherwise, show that $p_{0k}(t)=\tfrac{(\lambda t)^k}{k!}e^{-\lambda t}$.

The only non-vanishing entries of the $Q$ matrix are $-q_{ii}=q_{i,i+1}=\lambda$ for integer $i\ge0$, so that the Kolmogorov forward equation reads $p'_{00}(t)=-\lambda p_{00}(t)$ and $p'_{0i}(t)=-\lambda p_{0i}(t)+\lambda p_{0,i-1}(t)$ for integer $i\ge1$ with initial condition $p_{0i}(0)=\delta_{0i}$. You might find it convenient to rewrite this system in terms of the re-scaled functions $\tilde p_{0i}(t):=p_{0i}(t)e^{\lambda t}$, namely, $\tilde p'_{00}(t)=0$ and $\tilde p'_{0i}=\lambda\tilde p_{0,i-1}(t)$ for $i\ge1$ with initial condition $\tilde p_{0i}=\delta_{0i}$.
Consequently, $\tilde p_{00}\equiv1$, and a straightforward induction gives $\tilde p_{0i}(t)\equiv\tfrac{(\lambda t)^i}{i!}$ implying that $\mathsf{P}_0(X_t=i)=\tfrac{(\lambda t)^i}{i!}e^{-\lambda t}$, ie., $(X_t\mid X_0=0)\sim\mathsf{Poi}(\lambda t)$.

If $X_t$ is the pure birth process from Example Example 2, let $p_{jk}(t)$ be the transition probability from $j$ to $k\in \mathcal{I}=\{1,2,3,\dots\}$. Write the corresponding $Q$-matrix and the Kolmogorov forward equation for $p_{1k}(t)$. By using induction or otherwise, show that $p_{1k}(t)=e^{-\lambda t}(1-e^{-\lambda t})^{k-1}$ for integer $k\ge1$.

The only non-vanishing entries of the $Q$ matrix are $-q_{ii}=q_{i,i+1}=i\lambda$ for integer $i\ge1$, so that the Kolmogorov forward equation reads $p'_{1i}(t)=-\lambda i p_{1i}(t)+\lambda (i-1) p_{1,i-1}(t)$ for integer $i\ge1$ with initial condition $p_{1i}(0)=\delta_{1i}$. In terms of the auxiliary functions $\tilde p_{1i}(t):=p_{1i}(t)e^{\lambda it}$ this system becomes $\tilde p'_{1i}(t)=\lambda(i-1)e^{\lambda t}\tilde p_{1,i-1}(t)$ with initial condition $\tilde p_{1i}(0)=\delta_{1i}$.
Consequently, $\tilde p_{11}(t)\equiv1$ and $\tilde p'_{12}(t)=\lambda e^{\lambda t}$ implying that $\tilde p_{12}(t)=e^{\lambda t}-1$. Similarly, $\tilde p'_{13}(t)=2\lambda e^{\lambda t}(e^{\lambda t}-1)=2\tilde p'_{12}(t)\tilde p_{12}(t)$ leading to $\tilde p_{13}(t)=\bigl(\tilde p_{12}(t)\bigr)^2$. This suggests that $\tilde p_{1i}=\bigl(\tilde p_{12}(t)\bigr)^{i-1}$, which can be easily checked by induction. Hence, $p_{1i}(t)=e^{-\lambda it}\bigl(\tilde p_{12}(t)\bigr)^{i-1}=e^{-\lambda t}\bigl(1-e^{-\lambda t}\bigr)^{i-1}$, ie., $(X_t\mid X_0=1)\sim\mathsf{Geom}(e^{-\lambda t})$, as claimed.

The pure death process $(X_t)_{t\ge0}$ in $\mathcal{I}=\{0,1,2,\dots\}$ describes a population of individuals, such that, for fixed $\mu>0$, in every time interval $(t,t+h)$ each individual dies with probability $\mu h+o(h)$, independently of their own past and the behaviour of other individuals. Let $p_{jk}(t)$ be the corresponding transition probability from $j$ to $k\in \mathcal{I}=\{0,1,2,3,\dots\}$. Write the corresponding $Q$-matrix and the Kolmogorov backward equation for $p_{jk}(t)$. By using induction or otherwise, show that $(X_t\,|\,X_0=N)\sim\mathsf{Bin}(N,e^{-\mu t})$.

The non-vanishing entries of the $Q$-matrix are $q_{j,j-1}=-q_{j,j}=\mu j$ for $j\ge0$, and so the Kolmogorov backward equations read $p'_{j,k}(t)=\mu j(p_{j-1,k}(t)-p_{j,k}(t))$.
It is claimed that $p_{N,k}(t)=\binom{N}k\,e^{-\mu tk}(1-e^{-\mu t})^{N-k}$, which follows either from the independence assumption or from an inductive argument similar to that in [exse:ctMC-pure-birth-process].

Let $X$ be a Markov chain on $\{1,2,3\}$ with $Q$-matrix \[Q=\begin{pmatrix} -4 & 2 & 2 \\ 1 & -3 & 2 \\ 1 & 0 & -1 \end{pmatrix}\,.\] a) Use the diagonalisation method to compute $p_{23}(t)$.
b) Use the resolvents approach to compute $p_{23}(t)$.
c) Solve the linear system $\boldsymbol{\pi }Q=\boldsymbol{0}$ for a stochastic row vector $\boldsymbol{\pi}=(\pi_1,\pi_2,\pi_3)$. Compare the value of $\pi_3$ to $\lim\limits_{t\to\infty}p_{23}(t)$.
d) Write the Kolmogorov forward equations for $p_{1k}(t)$ with $k\in \mathcal{I}$.
e) Write the Kolmogorov backward equations for $p_{k2}(t)$ with $k\in \mathcal{I}$.

a) A standard computation gives $\det\bigl(\lambda I-Q\bigr)=\lambda(\lambda+3)(\lambda+5)$, so that $p_{23}(t)=a+be^{-3t}+ce^{-5t}$ for some unknown constants $a$, $b$, $c$. To find the latter, solve the following system of linear equations: $p_{23}(0)=0=a+b+c$, $p'_{23}(0)=q_{23}=2=-3b-5c$, $p''_{23}(0)=q^{(2)}_{23}=-6=9b+25c$. As a result, we get $a=\tfrac23$, $b=-\tfrac23$, and $c=0$, so that $p_{23}(t)=\tfrac23-\tfrac23e^{-3t}$.
b) The general approach gives $r_{23}(\lambda)=\hat{p}_{23}(\lambda)=\bigl((\lambda I-Q)^{-1}\bigr)_{23}=\tfrac{(-1)^{2+3}}{\det(\lambda I-Q)}\det(M_{32})$, where $M_{32}$ is obtained from $\lambda I-Q$ by removing row $3$ and column $2$. Clearly, $\det(M_{32})=-2(\lambda+5)$, implying that $r_{23}(\lambda)=\tfrac2{\lambda(\lambda+3)}=\tfrac2{3\lambda}-\tfrac2{3(\lambda+3)}$, which by the inversion formula ($\tfrac1{\lambda+d}\mapsto e^{-dt}$) gives $p_{23}(t)=\tfrac23-\tfrac23e^{-3t}$, as before.
c) The second equation in $\boldsymbol{\pi }Q=\boldsymbol{0}$ gives $\pi_2=\tfrac23\pi_1$ and the last implies $\pi_3=5\pi_2=\tfrac{10}3\pi_1$. After rescaling to achieve $\pi_1+\pi_2+\pi_3=1$ we get $\pi_1=\tfrac15$, $\pi_2=\tfrac2{15}$, $\pi_3=\tfrac23$. Notice that $\pi_3=\lim_{t\to\infty}p_{23}(t)$.
d) The forward equations $p_{ij}'(t)=\sum_kp_{ik}(t)\,q_{kj}$ for $i=1$ read $p'_{11}(t)=-4p_{11}(t)+p_{12}(t)+p_{13}(t)$, $p'_{12}(t)=2p_{11}(t)-3p_{12}(t)$, and $p'_{13}(t)=2p_{11}(t)+2p_{12}(t)-p_{13}(t)$.
e) The backward equations $p_{ij}'(t)=\sum_kq_{ik}\,p_{kj}(t)$ with $j=2$ read $p'_{12}(t)=-4p_{12}(t)+2p_{22}(t)+2p_{22}(t)$, $p'_{22}(t)=p_{12}(t)-3p_{22}(t)+2p_{32}(t)$ and $p'_{32}(t)=p_{12}(t)-p_{32}(t)$.

Let $X$ be a Markov chain on $\{1,2,3\}$ with $Q$-matrix \[Q=\begin{pmatrix} -6 & 6 & 0 \\ 2 & -5 & 3 \\ 0 & 4 & -4 \end{pmatrix}\,.\] a) Use the diagonalisation method to compute $p_{23}(t)$.
b) Use the resolvents approach to compute $p_{23}(t)$.
c) Solve the linear system $\boldsymbol{\pi }Q=\boldsymbol{0}$ for a stochastic row vector $\boldsymbol{\pi}=(\pi_1,\pi_2,\pi_3)$. Compare the value of $\pi_3$ to $\lim\limits_{t\to\infty}p_{23}(t)$.
d) Write the Kolmogorov forward equations for $p_{1k}(t)$ with $k\in \mathcal{I}$.
e) Write the Kolmogorov backward equations for $p_{k2}(t)$ with $k\in \mathcal{I}$.

a) A standard computation gives $\det\bigl(\lambda I-Q\bigr)=\lambda(\lambda+5)(\lambda+10)$, so that $p_{23}(t)=a+be^{-5t}+ce^{-10t}$ for some unknown constants $a$, $b$, $c$. To find the latter, solve the following system of linear equations: $p_{23}(0)=0=a+b+c$, $p'_{23}(0)=q_{23}=3=-5b-10c$, $p''_{23}(0)=q^{(2)}_{23}=-27=25b+100c$. As a result, we get $a=\tfrac9{25}$, $b=-\tfrac3{25}$, and $c=-\tfrac6{25}$, so that $p_{23}(t)=\tfrac9{25}-\tfrac3{25}e^{-5t}-\tfrac6{25}e^{-10t}$.
b) The general approach gives $r_{23}(\lambda)=\hat{p}_{23}(\lambda)=\bigl((\lambda I-Q)^{-1}\bigr)_{23}=\tfrac{(-1)^{2+3}}{\det(\lambda I-Q)}\det(M_{32})$, where $M_{32}$ is obtained from $\lambda I-Q$ by removing row $3$ and column $2$. Clearly, $\det(M_{32})=-3(\lambda+6)$, implying that $r_{23}(\lambda)=\tfrac{3(\lambda+6)}{\lambda(\lambda+5)(\lambda+10)}=\tfrac\alpha\lambda+\tfrac\beta{(\lambda+5)}+\tfrac\gamma{(\lambda+10)}$, where the unknown constants $\alpha$, $\beta$ and $\gamma$ can be directly obtained from the identity $\alpha(\lambda+5)(\lambda+10)+\beta\lambda(\lambda+10)+\gamma\lambda(\lambda+5)\equiv3(\lambda+6)$, by respectively using $\lambda\in\{0,-5,-10\}$. This gives $r_{23}(\lambda)=\tfrac9{25\lambda}-\tfrac3{25(\lambda+5)}-\tfrac6{25(\lambda+10)}$, which by the inversion formula ($\tfrac1{\lambda+d}\mapsto e^{-dt}$) implies $p_{23}(t)=\tfrac9{25}-\tfrac3{25}e^{-5t}-\tfrac6{25}e^{-10t}$, as before.
c) The first equation in $\boldsymbol{\pi }Q=\boldsymbol{0}$ gives $\pi_2=3\pi_1$ and the last implies $\pi_3=\tfrac34\pi_2=\tfrac94\pi_1$. After rescaling to achieve $\pi_1+\pi_2+\pi_3=1$ we get $\pi_1=\tfrac4{25}$, $\pi_2=\tfrac{12}{25}$, $\pi_3=\tfrac9{25}$. Notice that $\pi_3=\lim_{t\to\infty}p_{23}(t)$.
d) The forward equations $p_{ij}'(t)=\sum_kp_{ik}(t)\,q_{kj}$ for $i=1$ read $p'_{11}(t)=-6p_{11}(t)+2p_{12}(t)$, $p'_{12}(t)=6p_{11}(t)-5p_{12}(t)+4p_{13}(t)$, and $p'_{13}(t)=3p_{12}(t)-4p_{13}(t)$.
e) The backward equations $p_{ij}'(t)=\sum_kq_{ik}\,p_{kj}(t)$ with $j=2$ read $p'_{12}(t)=-6p_{12}(t)+6p_{22}(t)$, $p'_{22}(t)=2p_{12}(4)-5p_{22}(t)+3p_{32}(t)$ and $p'_{32}(t)=4p_{22}(t)-4p_{32}(t)$.

Let $X$ be a Markov chain on $\{1,2,3\}$ with $Q$-matrix \[Q=\begin{pmatrix} -4 & 2 & 2 \\ 1 & -3 & 2 \\ 0 & 0 & 0 \end{pmatrix}\,.\] a) Use the diagonalisation method to compute $p_{23}(t)$.
b) Use the resolvents approach to compute $p_{23}(t)$.
c) Solve the linear system $\boldsymbol{\pi }Q=\boldsymbol{0}$ for a stochastic row vector $\boldsymbol{\pi}=(\pi_1,\pi_2,\pi_3)$. Compare the value of $\pi_3$ to $\lim\limits_{t\to\infty}p_{23}(t)$.
d) Write the Kolmogorov forward equations for $p_{1k}(t)$ with $k\in \mathcal{I}$.
e) Write the Kolmogorov backward equations for $p_{k2}(t)$ with $k\in \mathcal{I}$.
f) If $h_3:=\min\{t\ge0:X_t=3\}$ is the first time the chain enters state $3$, find the probability $\mathsf{P}_2(h_3\le t)$.

a) A standard computation gives $\det\bigl(\lambda I-Q\bigr)=\lambda(\lambda+2)(\lambda+5)$, so that $p_{23}(t)=a+be^{-2t}+ce^{-5t}$ for some unknown constants $a$, $b$, $c$. To find the latter, solve the following system of linear equations: $p_{23}(0)=0=a+b+c$, $p'_{23}(0)=q_{23}=3=-2b-5c$, $p''_{23}(0)=q^{(2)}_{23}=-4=9b+25c$. As a result, we get $a=1=-b$ and $c=0$, so that $p_{23}(t)=1-e^{-2t}$.
b) The general approach gives $r_{23}(\lambda)=\hat{p}_{23}(\lambda)=\bigl((\lambda I-Q)^{-1}\bigr)_{23}=\tfrac{(-1)^{2+3}}{\det(\lambda I-Q)}\det(M_{32})$, where $M_{32}$ is obtained from $\lambda I-Q$ by removing row $3$ and column $2$. Clearly, $\det(M_{32})=-2(\lambda+5)$, implying that $r_{23}(\lambda)=\tfrac2{\lambda(\lambda+2)}=\tfrac1\lambda-\tfrac1{\lambda+2}$, which by the inversion formula ($\tfrac1{\lambda+d}\mapsto e^{-dt}$) gives $p_{23}(t)=1-e^{-2t}$, as before.
c) The first two equations in $\boldsymbol{\pi }Q=\boldsymbol{0}$, namely, $\pi_2=4\pi_1$ and $2\pi_1=3\pi_2$, being incompatible, imply $\boldsymbol{\pi}=(0,0,1)$. Notice that $\pi_3=\lim_{t\to\infty}p_{23}(t)$.
d) The forward equations $p_{ij}'(t)=\sum_kp_{ik}(t)\,q_{kj}$ for $i=1$ read $p'_{11}(t)=-4p_{11}(t)+p_{12}(t)$, $p'_{12}(t)=2p_{11}(t)-3p_{12}(t)$, and $p'_{13}(t)=2p_{11}(t)+2p_{12}(t)$.
e) The backward equations $p_{ij}'(t)=\sum_kq_{ik}\,p_{kj}(t)$ with $j=2$ read $p'_{12}(t)=-4p_{12}(t)+2p_{22}(t)+2p_{22}(t)$, $p'_{22}(t)=p_{12}(t)-3p_{22}(t)+2p_{32}(t)$ and $p'_{32}(t)=0$, where the latter equation is compatible with the fact $p_{32}(t)=p_{31}(t)=0$ since state $3$ is absorbing.
f) Notice that the chain $X_t$ leaves the set $\{1,2\}$ at rate $q_{13}=q_{23}=2$; therefore, $h_3\sim\mathsf{Exp}(2)$ implying that $\mathsf{P}_2(h_3\le t)=p_{23}(t)=p_{13}(t)=1-e^{-2t}$.

Let $X_t$ be a birth-and-death chain with birth rate $\lambda_n$ and death rate $\mu_n$ (at state $X_t=n$). Write $T_j$ for the first hitting time of state $j\ge0$ and consider the probability $a_n:=\mathsf{P}_n(T_0<\infty)=\mathsf{P}(T_0<\infty\mid X_0=n)$.
a) Show that the probabilities $a_n$ satisfy the following system of equations: \[a_n(\lambda_n+\mu_n)=a_{n-1}\mu_n+a_{n+1}\lambda_n\,,\qquad n>0\,.\] b) Deduce that $a_n=1+\bigl(a_1-1\bigr)\sum_{k=0}^{n-1}\prod_{j=1}^k\tfrac{\mu_j}{\lambda_j}$, and so $a_n<1$ iff $a_1<1$.
c) Show that state $0$ is recurrent iff $a_n=1$ for all $n\ge0$. Use the result in part b) to deduce that this is equivalent to $\sum_{k\ge0}\prod_{j=1}^k\tfrac{\mu_j}{\lambda_j}=\infty$.

a) This is just the formula of total probability at the time of the first jump out of state $n$.
b) This is immediate from the recurrence relations for the differences $a_n-a_{n-1}$.
c) Since the whole chain is a single communicating class, it is enough to check whether state $0$ is recurrent. The latter happens iff $\mathsf{P}_0(T_1<\infty)\mathsf{P}_1(T_0<\infty)\equiv\mathsf{P}_1(T_0<\infty)=1$, which by part b) is equivalent to $\sum_{k\ge0}\prod_{j=1}^k\tfrac{\mu_j}{\lambda_j}=\infty$.

The $M/M/k$ queue (with $k\ge1$) is a continuous time Markov chain in $\mathcal{I}=\{0,1,2,\dots\}$ with arrival rate $\lambda$ and service rate given by $\mu n$ if $n\le k$ and $\mu k$ for $n>k$, recall Example Example 19. Using the detailed balance or otherwise, find the corresponding invariant distribution. Deduce that this queue is stable (i.e., has an invariant distribution of total mass one) if and only if $\lambda<k\mu$. Identity this distribution for the $M/M/\infty$ queue.

Use the approach of Example Example 19 with $\lambda_n\equiv\lambda$ and $\mu_n=\mu\min(n,k)$. For $n\le k$ this gives $\pi_n=\pi_0\tfrac{\lambda^n}{n!\mu^n}$ while for $n>k$ we have $\pi_n=\pi_0\tfrac{\lambda^n}{k!k^{n-k}\mu^n}$. Clearly, the summability condition holds iff $\lambda/(k\mu)<1$.
For $k=\infty$ servers, this simplifies to $\pi_n=\pi_0\tfrac{\lambda^n}{n!\mu^n}$ so that $\pi_0=e^{-\lambda/\mu}$ and the stationary distribution is $\mathsf{Poi}(\tfrac\lambda\mu)$.

Notice that since the sum of each row of $Q$ vanishes, $0$ is an eigenvalue of each $Q$-matrix. Moreover, the sum of all eigenvalues of a square matrix always equals its trace, ${\sf tr}\,Q=\sum_iq_{ii}$; here, ${\sf tr}\,Q=-6=\sum_i\lambda_i$, as it should!↩︎
This result can also be extended to the case when $Q$ cannot be diagonalised, and thus has repeated eigenvalues, but we will not do this here.↩︎
notice that it is sufficient to evaluate both sides for each eigenvalue of $Q$!↩︎

Continuous time Markov chains