2 Decision Theory

An important aspect of health economics problems is that a decision needs to be made in the presence of uncertainty.

This is often done using Decision theory.

Decision Theory

A decision analytic model uses

a set of mathematical relationships to define
a series of possible consequences resulting from
a set of alternative options and
a set of uncertain events.

As well as the decisions we make, there are events that are random

This information is combined to deduce the best decision.

Decision Theory

By combining

the decisions made,
the cost of each decision made,
the probabilities of the events and
an outcome for each combination of decisions and events (often in terms of QALYs and money)

we can calculate the expected outcome and cost associated with each sequence of decisions.

In general in decision theory we express the outcome in terms of utility.

You can read about the link between utility and the QALY in Whitehead and Ali (2010).

Uncertainty

In health economics, the decisions relate to efficacy and resource allocation.

Should a new drug for asthma be funded?
What is the most cost-effective diagnostic strategy for urinary tract infections in children?
Should a certain group of cancer patients be moved onto a different treatment?

There is uncertainty inherent in these decisions:

apparently similar patients will respond differently to the same treatment
treatment outcomes will be not be fully known

Health economics: the big questions

In health economic decision analysis there are two basic questions:

Should this treatment / technology / intervention etc. be adopted, given the existing evidence and the uncertainty surrounding its outcomes? - If so, which strategy should be adopted, and for which cohort(s)?
Is more evidence / information required before question 1 can be answered?

We will look at some aspects of each of these.

2.1 Summary

Decision theory addresses the question

“How should one decide what to do when things are uncertain?”.

In decision theory, we

combine all the ingredients of the decision problem into a formal framework and,
using mathematical rules, find the optimal decision.

We can also use decision theory to answer questions like “How much should we be willing to pay for more information?”

2.2 The ingredients of a decision problem

The ingredients of a decision problem are:

Decisions
Events
Uncertainties / probabilities
Rewards / payoffs / outcomes / costs

We will illustrate this throughout with the simple example of a lottery ticket.

2.2.1 Decisions

The first task is to list all the possible decisions you can make, often written \(d_1,\ldots,d_k\).

For example, let’s say you’re deciding whether to buy a lottery ticket. Then we might have \[ \begin{align*} d_1: & \text{ Buy a ticket} \\ d_2: & \text{ Don't buy a ticket} \end{align*} \]

In this example, there is only one ‘phase’ of decision-making.

In many problems there might be several points at which a decision needs to be made, and options might depend on what has happened before.

2.2.2 Events

The second set of ingredients is the events, often labelled \(E_1,\ldots,E_m\).

We use this in the probabilistic sense, to mean the set of possible things that might happen after you have made your decision (or between subsequent decisions, if there are multiple phases).

Make sure that exactly one of the events will happen.

In our example about buying a lottery ticket, we might have:

\[ \begin{align*} E_1: & \text{ You win} \\ E_2: & \text{ You don't win} \end{align*} \]

When we make the decision, we don’t know which event will occur.

Uncertainties

At each point where one of a set of events will occur, we need the probability of each event.

All probabilities must be between zero and one.
An event with probability zero is impossible.
An event with probability one is certain.
The probabilities for a collection of events must sum to one.
The probabilities may depend on what has happened before.

Often based on information from studies / trials / research.

In our lottery example, we might have:

\[ \begin{align*} p\left(E_1\mid{d_1}\right) & = 0.0001 \text{(winning given you bought a ticket)} \\ p\left(E_1\mid{d_2}\right) & = 0 \text{ (winning given you didn't buy a ticket)} \end{align*} \]

2.2.3 Rewards / payoffs

The rewards (or payoffs) are the consequences following each combination of decisions and events.

Note that although these are called ‘rewards’, they can sometimes be bad!

In health economics, these will often be expressed in QALYs.

2.2.4 Costs

Linked to rewards is the idea of costs.

Usually, each decision will incur a cost.

In our example let’s say the cost of buying a lottery ticket is £1, and the reward if you win is £500.

In health economics, there may be other costs associated with outcomes, eg. ongoing treatment.

2.3 The decision tree

We can put everything together into a decision tree.

A decision tree is made of nodes and branches, arranged to show

the sequence of what could happen
the outcomes of each sequence (in terms of cost and reward)
the probability of each sequence

Nodes

There are two types of node:

decision nodes (where a decision has to be made)
chance nodes (where an event will occur).

At each node there are then a number of branches, depending on the number of possible events or decisions.

For any combination of decisions and events, we end up at a reward / outcome.

It is important that the pathways are mutually exclusive.

Lottery example: decision tree

The decision tree for our lottery example.

Time goes from left to right

2.4 Solving the Decision tree

The point of the decision tree was to help us to make a decision.

So, how do we use it to do that?

To solve a decision tree we

Combine the probabilities and outcomes to find the expected value of each decision.

Choose the decision(s) that lead to the best expected outcome

At each decision node, we rule out the options with the lower expected values, leaving one path:
The path with the best expected outcome

2.4.1 Backwards induction

We construct the tree from left (the first decision node) to right (the final outcomes)

We solve the tree from right to left.

We work through the tree, removing each node as we go, from right to left.

For each chance node,

calculate the expectation of its branches (using the probabilities and outcomes).

write this value at the chance node and remove the part of the tree to the right.

For each decision node,

choose the option that leads to the highest expected outcome,

then cross out all other branches.

When we have reached the root decision node, we will have found

The optimal path of decisions to make (ie. those not crossed out)
The expected outcome of that path.

2.4.2 Lottery example

For our lottery ticket example, we have

one decision node (the root node)
one chance node.

The chance node is the furthest to the right, so we will start there.

At the chance node, we have

probability \(0.0001\) of an outcome of £499

probability \(0.9999\) of an outcome of -£1.

Therefore our expected outcome (in £) is

\[0.0001 \times{499} + 0.9999 \times{-1} = -0.95 \]

Lottery example

Our tree therefore becomes

The decision tree for our lottery example, with the chance node solved.

Lottery example

We can now look at the two branches from the decision node.

The expected outcome if we choose \(d_1\) is £-0.95,
The expected outcome if we choose \(d_2\) is £0.

Therefore we cross out \(d_1\) and our optimal decision is not to buy a ticket.

The decision tree for our lottery example, with the chance node solved.

2.5 Expected value of perfect information (EVPI)

We can make decisions, but we don’t know what will happen at each chance node.

However, suppose we have an option to find out what will happen at each chance node

this would almost certainly have a big impact on our decision-making!

Expected value of perfect information (EVPI)

The expected value of perfect information is the difference in

the expected outcome when we know beforehand what will happen at each chance node (ie. we have perfect information) and

the expected outcome when we don’t know what will happen.

EVPI is calculated from the perspective of deciding whether or not to pay to gain the perfect information

we don’t yet know the outcome,
the probabilities are still important.

Lottery example: EVPI

For our lottery example:

If the perfect information revealed that our ticket would win, we would buy a ticket, and our outcome would be £499 .
If the perfect information revealed that our ticket would not win, we would not buy a ticket, and our outcome would be £0.

The probabilities of the information revealing each event are the same as the probabilities of the event.

So our expected outcome with perfect information is:

\[ 0.0001\times{£499} + 0.9999\times{£0} = £0.0499.\] Our optimal decision before (not to buy a ticket) had expected outcome £0,

So our expected value of perfect information is

\[ EVPI = £0.0499 - £0 = £0.0499. \]

3 Decision analysis for treatments

In this section:

Our angina example
Incorporating costs
See notes for GORD - a more complex example

3.1 Example: Angina operation

Williams (1985) presents several scenarios for a certain class of angina patients

coronary artery bypass grafting (an operation)
standard ongoing medical management.

We will go through a simplified version, but the paper is a useful reference.

Possible outcomes from the operation in QALYs, taken from Williams (1985).

Angina example

These are the probabilities of those outcomes if the operation is performed:

p(Improvement) = 0.67
p(No change) = 0.3
p(Operative mortality) = 0.03

The decision tree for our angina example. The QALY values have been estimated by eye from Williams (1985).

Angina example: backward induction

We can solve this with backward induction as before.

For the chance node following the operation, we have an expected outcome of

\[ \underbrace{0.67\times{9}}_{\text{Improvement}} + \underbrace{0.3\times{4.4}}_{\text{No change}} + \underbrace{0.03 \times{0}}_{\text{Operative} \\ \text{mortality}} = 7.35 \text{ QALYs.}\]

The expected outcome of staying on the medical management plan is 4.4 QALYs.

Therefore the optimal decision is to have the operation.

Incorporating costs

So far we have just considered QALYs.

In fact, each decision option has associated with it a cost.

In this case:

the operation and subsequent care costs around £3000 per patient
medical management costs around £500 per patient

Therefore our incremental cost efficiency ratio is

\[ ICER = \frac{3000-500}{7.35 - 4.4} = \frac{2500}{2.95} = 847.46. \] and if our willingness-to-pay threshold is above £847.46 per QALY, we will perform the operation.

Angina example: EVPI

How much would we pay in this case for perfect information?

Our outcome with perfect information would be

\[\underbrace{0.67\times{9}}_{\text{Operation successful}} + \underbrace{0.3\times{4.4}}_{\text{No change}} + \underbrace{0.03\times{4.4}}_{\text{Operative mortality}} = 7.482.\]

This is \(7.482 - 7.35 = 0.132\) QALYs more than our expected outcome.

Therefore we will pay up to 0.132 times our willingness-to-pay-threshold for perfect information.

A much more realistic (complex!) EVPI example in McCullagh, Walsh, and Barry (2012).

4 Diagnostic testing

In this section:

Measures of test accuracy
Predictive value of diagnostic tests
Using decision trees to calculate probabilities
Sequential diagnostic testing

Diagnostic testing

The aim for performing diagnostic tests is to

find people with the disease/condition
treat them effectively

We want to avoid

missing people who have the disease
treating people unnecessarily.

Diagnostic tests

For a given condition there may be a range of different diagnostic tests, each with different costs and accuracies.

Think of the more costly but more accurate PCR test compared to the less expensive but less accurate LFD test.

Rautenberg, Gerritsen, and Downes (2020) give a review of the use of decision theory in diagnostic testing, as well as setting forward good practice.

Notation

We will use

\(D\) to denote a disease (or condition)

\(T\) to denote a test.

Superscript \(+\) or \(-\) to denote the outcome.

For example, \(T^-\) means that a test is negative, whereas \(D^+\) means the disease/condition is present.

4.1 Measures of test accuracy

Two measures are important when considering diagnostic test accuracy:

Sensitivity: \(p\left(T^+\mid{D^+}\right)\): The probability that the test is positive given that the patient does have the disease (a true positive) .
Specificity: \(p\left(T^-\mid{D^-}\right)\): The probability that the test is negative given that the disease is not present (a true negative).

As sensitivity increases, specificity often decreases

At the extreme,

if a test is always positive (ie. \(p\left(T^+\right)=1\)) then the sensitivity would be 1
but the specificity would be 0

Clearly this would not be a very useful test.

4.1.1 The Confusion Matrix

We can visualise these measures using a confusion matrix.

	Test_Positive	Test_Negative
Disease	True positives (TP)	False negatives (FN)
No Disease	False positives (FP)	True negatives (TN)

4.2 Predictive value

What we really want to know is

\[p\left(D\mid{T}\right)\] The probability someone has the disease, given their test result.

We use Bayes theorem

For two events \(A\) and \(B\), with \(p\left(B\right)>0\),

\[ p\left(A\mid{B}\right) = \frac{p\left(B\mid{A}\right)p\left(A\right)}{p\left(B\right)}.\]

Bayes theorem for predictive value

For us, this becomes (for example)

\[p\left(D^+\mid{T^+}\right) = \frac{p\left(T^+\mid{D^+}\right)p\left(D^+\right)}{p\left(T^+\right)}.\] Therefore we need:

\(p\left(T^+\mid{D^+}\right)\) - the sensitivity.
The prevalence \(p\left(D^+\right)\): the proportion of the population who have the disease.
The probability of the test outcome, \(p\left(T^+\right)\)

Bayes theorem for predictive value

Two more measures of diagnostic accuracy:

Positive predictive value: \[p\left(D^+\mid{T^+}\right).\] The probability of having the disease given a positive test result
Negative predictive value: \[p\left(D^-\mid{T^-}\right).\] The probability of not having the disease given a negative test result.

Because of the dependence on the prevalence, these quantities may need to be re-calculated often.

Example: calculating predictive value

Suppose a diagnostic test for a particular disease has sensitivity 0.99 and specificity 0.8. That is, \[p\left(T^+\mid{D^+}\right) = 0.99\] and \[p\left(T^-\mid{D^-}\right)=0.8.\] The prevalence of the disease in the population is 1%, that is \[ p\left(D^+\right) = 0.01.\]

Example: calculating predictive value

We first need to calculate \(p\left(T^+\right)\), using partition theorem:

\[\begin{align*} p\left(T^+\right) & = p\left(T^+\mid{D^+}\right)p\left(D^+\right) + p\left(T^+\mid{D^-}\right)p\left(D^-\right)\\ &= 0.99 \times{0.01} + \left(1-p\left(T^-\mid{D^-}\right)\right)p\left(D^-\right)\\ & = 0.99 \times{0.01} + 0.2 \times{0.99}\\ & = 0.0099 + 0.198\\ & = 0.2079. \end{align*}\]

4.2.1 Exercise

Take a guess at what you think \(p\left(D^+\mid{T^+}\right)\) and \(p\left(D^-\mid{T^-}\right)\) will be.
Use the values given and Bayes theorem to calculate them.

Reminder

\[\begin{align*} p\left(T^+\mid{D^+}\right) & = 0.99\\ p\left(T^-\mid{D^-}\right)& =0.8\\ p\left(T^+\right) & = 0.2079 \\ p\left(D^+\right) & = 0.01 \end{align*}\]

Bayes Theorem: \[p\left(D^+\mid{T^+}\right) = \frac{p\left(T^+\mid{D^+}\right)p\left(D^+\right)}{p\left(T^+\right)}\]

The prosecutor’s fallacy

In this example Even though \(p\left(T^+\mid{D^+}\right)\) is high, \(p\left(D^+\mid{T^+}\right)\) is low

To see this:

\[\begin{align*} p\left(T^+\right) & = p\left(T^+\mid{D^+}\right)p\left(D^+\right) + p\left(T^+\mid{D^-}\right)p\left(D^-\right)\\ & = \underbrace{0.99 \times{0.01}}_{\text{Very few true positives}} + \underbrace{0.2 \times{0.99}}_{\text{Many false positives}} \end{align*}\]

More generally

The sensitivity \(p\left(T^+\mid{D^+}\right)\) is often confused with the
predictive power \(p\left(D^+\mid{T^+}\right)\).

This is known as the Prosecutor’s fallacy

4.3 Decision trees for diagnostic testing

The decision tree structure is often used as a calculation tool (ie. with no decisions to be made).

We will use this to think about diagnostic tests

Decision trees for diagnostic testing

A ‘decision’ tree for diagnostics.

4.3.1 Example

Suppose some disease has prevalence \(p\left(D^+\right)= 0.2\), and we know that

Sensitivity = \(p\left(T^+\mid{D^+}\right) = 0.86 = 1- p\left(T^-\mid{D^+}\right)\)
Specificity = \(p\left(T^-\mid{D^-}\right) = 0.7 = 1- p\left(T^+\mid{D^-}\right)\)

We can then fill in the tree as follows:

A disease-based approach.

Decisions for diagnostics

These diagnostics results can be used in two (probably more) ways:

We can compare two competing diagnostics to see which is most cost-effective
- Should some new diagnostic replace an existing one?
To understand the cost-effectiveness of two or more tests as a sequential diagnosis
- Less expensive test first as ‘triage’
- More costly test only for second round.

The resulting probabilities can feed into another decision analysis.

For example, a gold standard test could be a source of [not quite] perfect information

4.4 Sequential diagnostic testing

Suppose we have two tests:

Test 1 is inexpensive but not as accurate as test 2
Test 2 is more expensive and more accurate.

In sequential diagnostic testing:

We use test 1 to ‘triage’ patients
Only those who test positive are given test 2.

4.4.1 Example continued:

Let’s imagine that the diagnostic test from our example is the first test (\(T_1\)) in a sequential testing plan.

If 1000 people are tested, we expect:

172 to be true positives
28 to be false negatives
240 to be false positives
560 to be true negatives

	Test_Positive	Test_Negative
Disease	172	28
No Disease	240	560

Only the \(172 + 240 = 412\) with a positive result would be sent for the second test \(T_2\).

Note that even with this rather high disease prevalence of 0.2, less than half of the positive tests are correct.

Example - a second test

Let’s say that for the second test \(T_2\) we have

Sensitivity = \(p\left(T_2^+\mid{D^+}\right) = 0.95 = 1- p\left(T_2^-\mid{D^+}\right)\)
Specificity = \(p\left(T_2^-\mid{D^-}\right) = 0.80 = 1- p\left(T_2^+\mid{D^-}\right)\)

We assume here that the results of \(T_1\) and \(T_2\) are conditionally independent given the disease state

\(T_1\perp{T_2}\mid{D}\),

therefore \(T_2\mid{D,T_1}\sim T_2\mid{D}\).

This means we can use the sensitivity and specificity stated, even though the patients will have already been tested with \(T_1\).

Example - sequential testing

Sequential testing.

Example: sequential testing

From this we can compare the results after just the first test with the results following both tests.

Test 1 only.
	D_pos	D_neg
T1_pos	172	240
T1_neg	28	560

Sequential testing.
	D_pos	D_neg
Test_pos	163.4	48
Test_neg	36.6	752

Using the sequential testing

Approximately 192 people (those with \(T_1^+\) but \(T_2^-\)) would be ruled out for treatment.
Of these, around 8.6 do in fact have the disease.

Whether this is acceptable would depend on:

How quickly the disease is likely to be picked up in the \(\sim8.6\) false negatives
- and the 28 false negatives from \(T_1\)
The saving from not treating these \(\sim{192}\) people?
The seriousness of treating a false-positive person

These calculations often feed into health economic models.

More of this, and some of the difficulties surrounding decision making with diagnostic tests, in Sutton et al. (2008).

5 Receiver-operating characteristic (ROC) analysis

An important topic in diagnostics (and classification generally)!

In this section:

What is ROC analysis?
ROC for overall diagnostic performance
Choosing a value for the decision threshold
Example

Receiver-operating characteristic (ROC) analysis

So far we have assumed a diagnostic test outputs positive or negative.
In most cases, a measurement (from which a test is derived) gives a continuous value

We must choose a decision threshold \(T\) (\(\text{text}>T\rightarrow\text{pos}\))
There is no objectively correct or true cut-off value

Developed during WW2 to assess accuracy of radar operatives.

Continuous test measurements

Suppose we take some measurement from a number of people.
Some of the people have disease D, the others don’t.

Probability distributions of a measurement for people with (D) and without (No D) a disease.

Continuous test measurements

Suppose we set our decision threshold \(T=0\)

We see

around 90% of people with D are correctly classified
Only half of those without D are correctly classified

New threshold

Now we set \(T=0.5\)

Our specificity has increased
Our sensitivity has decreased

ROC space

Any value of \(T\) produces a pair (Sensitivity, Specificity), which we can plot

ROC curve

If we vary the decision threshold continuously, we can produce a ROC curve:

The ROC curve for the measurement shown in Figures 5.1-5.3. AUC is ‘area under the curve’, an idea we will explore shortly.

ROC curve